15

I've had this problem with all my raspberry pi's: they simply stop responding to SSH, VNC and other things, after some time. When this happens, I go to my router configuraton page and I can see the raspberry pi is connected. Then, when I restart the Pi, of course it works, but after some hours, I know the problem will happen again.

This always happens to me. I really wish my raspberry pi were 100% of the time connected and able to serve me things.

Ghanima
  • 15,855
  • 15
  • 61
  • 119
Guelando
  • 151
  • 1
  • 1
  • 3
  • Not much to go on here. Which Pi model do you use? Can you tell us whether you use a wifi-dongle or an Ethernet cable? Is your power supply able to deliver enough power? – Werner Kvalem Vesterås Mar 27 '15 at 14:52
  • Power supply: 1A. Pi model: B+. Wifi dongle. – Guelando Mar 27 '15 at 15:24
  • It could be a power issue. Can you try to connect it using an ethernet cable and see if the problem persists (usb wifi dongle may require more amps than the pi is eventually able to provide). Also, the pi appearing in the router config does not mean it is "connected", it just means it had an IP assigned, but it could be hanged and probably yoy will see the pi there anyway. – jotadepicas Mar 27 '15 at 17:53
  • Wifi Problem! Try LAN for a day and it wont happen, or try and use WICD-CURSES - It usually works well for me with WiFi reconnect but you need a good power supply, like a regulated one, not just a cheapo USB power supply, to completly solve the issue. Cheap USB things have allot of noise on the DC line, this messes up WiFi allot of times! The other times its just not enough AMPs to keep it going stable. – Piotr Kula Jul 26 '15 at 18:21

3 Answers3

8

Stuff like this can be really tedious to peg down, since unless you have a keyboard and monitor you can plug in, there's no way, if ssh doesn't work, to check what's gone wrong on the live system.

Here's a simple starting point:

#!/bin/bash

# Set these to whatever you want.
router_ip=192.168.0.1
log_file=/tmp/mystery.log

# Make sure we can write to the log.
touch $log_file
if [ $? != 0 ]; then
    echo "Cannot use $log_file."
    exit 1
fi  

# Redirect output.
exec 1> /dev/null
exec 2>> $log_file

# A function for logging.
print2log () {
    echo $(date +"%D %R ")$@ >>$log_file
}

# Loop infinitely.
while [ 1 ]; do
    sleep 900 # 15 minutes
    # Ping router.
    ping -c 1 $router_ip & wait $!
    if [ $? != 0 ]; then
        print2log "Ping $router_ip failed."
    else print2log "Ping OK."
    fi
    # Check sshd.
    print2log "sshd PIDs: "$(ps -o pid= -C sshd)
done

Call this check.sh or whatever you want, chmod 755 check.sh to make it executable, and start it from within an ssh login:

setsid ./check.sh &

It does not have to be run sudo. You can now log out and that should stay going. Every 15 minutes it will print something like this to /tmp/mystery.log:

03/27/15 10:59 Ping OK.
03/27/15 10:59 sshd PIDs: 4261 14262

The first line indicates there is a working network connection and the second one indicates sshd is running. WRT those PIDs: there should be at least one, and while exactly what it is doesn't matter, it should be reasonably consistent (i.e., not change every 15 minutes).

If there are no PIDs at a certain point, you have at least confirmed that sshd has died for some reason.

grep sshd /var/log/syslog

Should help you find the reason.

goldilocks
  • 58,859
  • 17
  • 112
  • 227
  • Thx for this script. I'm running into the same issue with an up-to-date Raspberry Pi 4. Unfortunately, the script output still contains Ping OK and some sshd PID during times when I cannot reach the Raspi anymore. So the Raspi thinks he is still accessible, while not being accessible by other components in the network.

    Do you have any other idea @goldilocks?

    – mu88 Nov 05 '23 at 11:28
  • "So the Raspi thinks he is still accessible, while not being accessible by other components in the network." -> It probably really is connected to WLAN and able to ping the router then. It's not impossible for that to be the case yet its IP address is not useful to anything else; I have seen this happen occasionally on systems with a monitor and keyboard attached which makes it easy to tell the internet is accessible. I'm no expert on link layer networking, but I am pretty sure one cause of this can be ARP failure... – goldilocks Nov 05 '23 at 14:32
  • ....which might be related to a stale DHCP lease halfway working (the system remains valid in terms of WPA encryption, and the router forwards packets back and forth, but traffic from within the WLAN doesn't work). That bit is really a semi-educated guess; in any case it tends to happen with things that are online constantly for a long time period (days, weeks) vs. eg. your laptop which probably reconnects frequently coming out of suspend, across reboots, etc. You might try scheduling it to disconnect and reconnect once or twice a day. – goldilocks Nov 05 '23 at 14:32
  • Meaning a full reboot? Or what do you mean by disconnect and reconnect in detail? – mu88 Nov 05 '23 at 16:01
  • 1
    If that's convenient, a reboot would work (presuming reconnecting will solve the problem) -- but what I meant was disconnect from the WLAN then reconnect 30 seconds later. When I had this kind of problem but still had access via a keyboard, I think I would disconnect, delete any cached DHCP leases (where those are depends on what networking/dhcp service you are using, possibly somewhere in /var/lib) and reconnect. But try without deleting the leases first. – goldilocks Nov 05 '23 at 17:14
3

I once had the same problem: no connection over WiFi anymore. In my case, the WiFi dongle was sent to sleep after a certain time of inactivity and couldn't be woke up remotely (obviously). As solution I had to deactivate the powermanagement for the WiFi dongle.

EDIT:

I was using an Edimax WiFi dongle on Arch Linux, and to disable power management, I needed to add the following line to /etc/modules

options 8192cu rtw_power_mgnt=0
walderich
  • 131
  • 5
  • 2
    Adding how you disabled the power management (and which dongle) would help other users – Wilf Dec 23 '15 at 13:30
0

I usually have this problem on remote hosts but look at client avile interval and client avile max count: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man5/sshd_config.5?query=sshd%5fconfig&sec=5