0

after reading this answer from @Ingo for how to setup dynamic network failover I am having a question. First of all thanks a lot to @Ingo posting this manual how to setup bonding/dynamic network failover. It was very easy to setup and works fine for me if my ethernet and wlan network IP address are the same!

My goal is to have the interface switch over via bonding to the wlan interface (that is a LTE/cellular router) if the internet connection on the ethernet interface is down.

Currently the bonding failover only works if I fully disconnect the ethernet (by unplugging the cable). However, I would like the ethernet interface to set the MII status to down if the internet is not working anymore.

Does somebody have a tip here for me how to approach this? Thanks a lot in advance!

Darksta
  • 25
  • 6
  • 1
    "Currently the bonding failover only works if I fully disconnect the ethernet (by unplugging the cable)." -> Are you sure it won't happen if you manually ip link set eth0 down? It may remain "up" regardless of the connection state (because the carrier is still there). If so, the easiest solution might be to just ping something at intervals then set it down when that fails.. – goldilocks Oct 04 '19 at 13:30
  • Thanks for your reply! It does happen when I do "ip link set eth0 down". However in the more likely scenario when the internet is down, the network connection (LAN) is still up. I am assuming it does not detect/notice the internet ist down? I have a program running that pings/connects to the internet continuously already. This program then fails/gets http error messages. However it unfortunately did not set the MII status to down for the ethernet connection and for the bonding to switch. – Darksta Oct 04 '19 at 13:55
  • 1
    "I am assuming it does not detect/notice the internet ist down?" No. There's no way to do that passively (ie., without sending packets out), and no definitive universal way to do it actively by noting a failure to send, since that is very context dependent (send what where? Eg., Does a failure to connect to google.com:80 indicate the internet is down? And detecting a failure to connect to anything would take a very, very long time). The router can tell when the uplink is gone, but it will not turn off the LAN because of that even if there's only one thing connected. – goldilocks Oct 04 '19 at 14:03
  • Is there a way to have a local logic implemented that for example pings google every 2mins and if not reachable, failover? Would that be the way to go to achieve automatic failover incase the internet on eth0 is down? – Darksta Oct 04 '19 at 18:40
  • 1
    In the past I've pinged an (external) DNS server as set by DHCP negotiation; conventionally they're added to /etc/resolv.conf, but the dynamic failure config from Ingo's post may mess with that (investigate if you want to use that method) -- premise being that if you can't communicate with your DNS server problems will soon follow. But anything you consider reliable (eg., google DNS: 8.8.8.8, 8.8.4.4) should be fine. It'd be great if there were a simple way to query the router about this, and in any case I'm sure there are more fine tuned ways, but this one is simple and may suffice. – goldilocks Oct 04 '19 at 19:03
  • Thanks for your reply @goldilocks I think so too and will give it a try to see if it is enough. – Darksta Oct 06 '19 at 08:49

1 Answers1

3

@goldilocks put it in a nutshell in his comment so I will quote it here:

There's no way to do that passively (ie., without sending packets out), and no definitive universal way to do it actively by noting a failure to send, since that is very context dependent (send what where? Eg., Does a failure to connect to google.com:80 indicate the internet is down? And detecting a failure to connect to anything would take a very, very long time). The router can tell when the uplink is gone, but it will not turn off the LAN because of that even if there's only one thing connected.

But you asked (in a reply to this) if there is a way to have a local logic implemented that for example pings google every 2mins and if not reachable, failover?

This is not an easy task and we have to grab deep into the bag of tricks about networking. There are mainly two issues:
1. The eth0 interface must always be up. Otherwise we can't ping if the connection is back again. We can only manage that eth0 is a slave member of bonding or not.
2. It is the nature of bonding that you always have a connection, no matter if eth0 is connected or not. So you can't simply ping from eth0. You will always get a response because the default route is going through interface bond0. So we have to use source routing to see if a connection from interface eth0 (not a slave of bonding at this time) succeeds. This is made with policy routing but details about it are out of scope here. Just take it.

Here is a tested example how I would do it. To simplify things I have eth0 given a static ip address. You have already made a setup as shown in Howto migrate from networking to systemd-networkd with dynamic failover.

First create a new routing table pingtest with ID 200:

rpi ~$ sudo bash -c 'echo 200 pingtest >> /etc/iproute2/rt_tables'

Then ensure that /etc/systemd/network/12-bond0-add-eth.network looks like this:

[Match]
Name=e*

[Network]
Bond=bond0
PrimarySlave=yes
Address=192.168.50.60/24

[RoutingPolicyRule]
Table=200
Priority=16384
From=192.168.50.60

[Route]
# This makes a default route
Table=200
Destination=0.0.0.0/0
Protocol=static
Gateway=192.168.50.1
PreferredSource=192.168.50.60

Then create a bash script pingtest.sh:

#!/bin/bash
IF="eth0"
BOND="bond0"
PING_SOURCE="192.168.50.60"
PING_DEST="google.com"
POLL="120"   # polling time in seconds

while true; do
    if [[ $(/bin/ip -br link show dev "$IF") == *",SLAVE,UP,"* ]]; then
        #echo DEBUG: "$IF" is slave
        /bin/ping -Bnq -c3 -w3 "$PING_DEST" &>/dev/null
        if [ $? -ne 0 ]; then
            /bin/ip link set "$IF" nomaster
            /bin/ip link set "$IF" up
            echo "$IF" removed from "$BOND", ping to "$PING_DEST" failed
        fi
    else
        #echo DEBUG: "$IF" is no slave
        /bin/ping -Bnq -c3 -w3 -I "$PING_SOURCE" "$PING_DEST" &>/dev/null
        if [ $? -eq 0 ]; then
            /bin/ip link set "$IF" down
            /bin/ip link set "$IF" master "$BOND"
            /bin/ip link set "$IF" up
            echo "$IF" added to "$BOND", ping to "$PING_DEST" succeeded
        fi
    fi
    sleep "$POLL"
done

For testing you can uncomment the two echo DEBUG: statements and execute the script from the command line with sudo. Don't forget to make it executable with chmod +x pingtest.sh. Messages from the script about bond changes you will find in the journal or with systemctl status pingtest.service (service see below).

To run the script create a service with:

rpi ~$ sudo systemctl --force --full edit pingtest.service

In the empty editor insert these statements, save them and quit the editor:

[Unit]
Description=Pingtest if destination is up
Wants=network.target
After=network.target

[Service]
ExecStart=/home/pi/pingtest.sh

[Install]
WantedBy=network.target

Enable the new service with

rpi ~$ sudo systemctl enable pingtest.service

Reboot.

The routing of this setup looks like this:

rpi ~$ ip route show table main
default via 192.168.50.1 dev bond0 proto dhcp src 192.168.50.205 metric 1024 
192.168.50.0/24 dev bond0 proto kernel scope link src 192.168.50.205 
192.168.50.0/24 dev eth0 proto kernel scope link src 192.168.50.60 
192.168.50.1 dev bond0 proto dhcp scope link src 192.168.50.205 metric 1024

rpi ~$ ip route show table pingtest
default via 192.168.50.1 dev eth0 proto static src 192.168.50.60

rpi ~$ ip rule ls
0:      from all lookup local 
16384:  from 192.168.50.60 lookup pingtest 
32766:  from all lookup main 
32767:  from all lookup default
Ingo
  • 42,107
  • 20
  • 85
  • 197
  • Thank you very much @Ingo ! Your help is much appreciated. That is exactly what I was looking for, great!! I will give it a try. – Darksta Oct 06 '19 at 08:48
  • It works like a charm. I have included a second ping check to amazon.com in case google.com should not be reachable and then used pm2 (node package) to do pm2 start pingtest.sh to run it as a job (as I was using pm2 already for other stuff). Thanks again for your help! :) – Darksta Oct 06 '19 at 11:37
  • I was a bit fast with my previous comment @Ingo The script you have shown is executing the ping on the default network interface I think. Meaning as soon as the bond switched to wlan0, the ping will work again, it will then switch back to eth0, even though that network is still offline. Then it will switch back and forth. I have tried using the -I command to specify the source interface for the ping like that: /bin/ping -I eth0 -q -c3 -w3 google.com &>/dev/null

    However, unfortunately that doesn't work for me. It either says "network unavailable" or 100% packet loss even though it's up.

    – Darksta Oct 06 '19 at 13:39
  • @Darksta You are right. The idea was too simple and I have made some error in reasoning. You must ping from eth0 to check its route but you can't because it hasn't an ip address. You can't also when it is down. To check eth0 must never go down. Otherwise you can't see if connection is back again. Seems not to be an easy task and have just no idea. Maybe there is a way to dynamically switch the primary interface on the bonding interface and have two (virtual) interfaces on eth0, one for bonding and one with ip address, hm... I will look at it. Please don't mark it as solution. – Ingo Oct 06 '19 at 14:49
  • Okay, I understand. Thats unfortunate. I really liked the solution! Could this maybe be an approach for a new solution? https://unix.stackexchange.com/questions/504726/what-can-i-do-to-enable-automatic-switching-to-a-backup-network-when-there-is-pa I did not fully grasp it yet though :-/ – Darksta Oct 07 '19 at 17:53
  • @Darksta The link you have given is about arp testing. With arp testing you can only check for local devices, not for remote sites like google.com. But I have found a solution with source routing and have rewritten the answer. For reference to the comments her is the old wrong answer https://raspberrypi.stackexchange.com/revisions/104267/2. – Ingo Oct 07 '19 at 21:00
  • Hi @Ingo , thanks a lot for your reply and update solution! Much appreciated! I have set it up as per your description. The script successfully removes the eth0 interface when there is no internet. Then wlan0 is the active interface. so far so good. However, it seems that the eth0 is still active though. Internet is not working but I can still access my eth0 networks router for example via ip. wlan0 is the up and active interface. Calling a website gives dns error. Strange? The routing tables etc. look similar to yours! Do you have an idea what the issue could be? – Darksta Oct 12 '19 at 14:04
  • @What you describe is all as expected as far as I can see. It is what I mentioned with point 1. and 2. in the answer. bond0 (using wlan0) is always up so you can still access the local network with the router. If that connection (bond0 using wlan0) provides internet access then you have it, otherwise not. eth0 must always be up. How would you ping that connection (using eth0) if google.com is available again? – Ingo Oct 12 '19 at 16:55
  • Ah, I understand! Would this setup still work if the wlan0 router is not bridged to the network but is a separate network? It seems the pi is even with the bond0 on wlan0 still trying to communicate over eth0. Maybe I just have to set a fixed address and dns so the wlan0 doesn't use 192.168.2.1 as the dns server (which in this setup points still to the eth0 router with no internet)? – Darksta Oct 12 '19 at 17:43
  • @Darksta You can use a complete different connection for wlan0, another router on another network, another provider, what you want. It is a redundant connection. On the main routing table I have shown, there is the default route set to default via 192.168.50.1 dev bond0 proto dhcp src 192.168.50.205 metric 1024 so this route using bond0 and source ip address 192.168.50.205 is used by default. ping google.com should use this route and should be the same as ping -I 192.168.50.205 google.com. – Ingo Oct 12 '19 at 18:02
  • @Darksta Only if the source ip address is 192.168.50.60 then the default route from routing table pingtest is used with default via 192.168.50.1 dev eth0 proto static src 192.168.50.60 from eth0. ping -I 192.168.50.60 google.com will use eth0 (and its connection) to ping. – Ingo Oct 12 '19 at 18:03
  • @Darksta I wrote that you can use another router on another network for wlan0. That's not completely true. The second router must be on the same local network (broadcast domain) but you can have two (or more) router on the network. To get a better idea how it works you can set PING_DEST="google.com" to PING_DEST="google.xyz" in the script. Then you should have internet connection with bond0 (using wlan0) but eth0 isn't slave of bond0 because it fails. – Ingo Oct 12 '19 at 18:35
  • You are awesome! It is working great now! I had not setup the fixed IP address for eth0 interface correctly. after doing that (by repeating the setup step for eth0 with static ip from your Original Manual It works like a charm so far. Thanks a lot, that is great!! – Darksta Oct 13 '19 at 10:00