NinjaKitty Posted October 17, 2017 Posted October 17, 2017 So, I've been struggling to figure out the reason behind this problem. I have 4 Le Potatoes, all connected to a 100 mbps switch. Each of them have a 32 GB Samsung SD Card EVO. Initially, the problem was that my SSH session would hang at random usually when I download something. I can't SSH again unless I reboot the PI (I didn't bother waiting longer than 5min) Initially, I thought SD Cards were bad, I moved them around, done reinstalls. I'm also using SD Formatter to format them and Etcher to install them. Still, this freezing problem had no consistent behavior according to PI hardware or the SD cards.. For Power, I'm using Anker 40W 4-Port USB Wall Charger, which it should (I think) be enough power. I've checked all the cables using my phone to verify. I plugged HDMI into them, and see if they were still running after those SSH crashes. (why didn't I do this earlier) I ssh'd into them and see what happened when the "crash" behavior happened. After SSH hangs, I check the PI, and the computer is still running as usual. I reboot the PI, and I start doing some downloads, and then now I see the network hang I get from ssh. To me it seems like the ethernet driver is crashing or something. I'm not sure what else to try for debugging / what the problem is. Send Help. Version: ARMBIAN 5.34.171017 nightly Ubuntu 16.04.3 LTS 4.13.7-meson64 Using Le Potato 2GB Version.
Tido Posted October 17, 2017 Posted October 17, 2017 to help u look in the documentation section, what these thingy can report. /sent from mobile phone /
Igor Posted October 17, 2017 Posted October 17, 2017 @TonyMac32 This is fixed on C2 while here it looks it's not (if I am looking into the correct file). Related? https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts?id=f7bcd4b6f6983d668b057dc166799716690423a4
NinjaKitty Posted October 17, 2017 Author Posted October 17, 2017 29 minutes ago, Tido said: to help u look in the documentation section, what these thingy can report. I think you forgot to link something.
TonyMac32 Posted October 17, 2017 Posted October 17, 2017 I'm not sure this carries over to the s905x from the s905, as the s905x can only officially go 1.5 GHz in the first place. If he has time, @Neil Armstrong may be able to answer better, I don't know all of the differences between the C2 and K2 GXBB and the Le Potato GXL. I haven't seen this issue on my board, but of course I haven't done too many long-term tests. @NinjaKitty, are you using the latest kernel image? File download was the simplest way for me to trip the memory fault associated with The board didn't typically die immediately after the hang, it took some other activity first. If you can give me the output of armbianmonitor -u maybe the fault left some evidence. My Le Potato was left 24 hours without incident last night, I will repeat but give it some sort of activities to see if it fails.
TonyMac32 Posted October 17, 2017 Posted October 17, 2017 14 minutes ago, NinjaKitty said: I think you forgot to link something. https://docs.armbian.com/ There you'll find all sorts of good info.
NinjaKitty Posted October 17, 2017 Author Posted October 17, 2017 @TonyMac32 I'm on version 4.13, which I downloaded yesterday. From your post, I'm assuming there's a 4.15 version? I'll try to reproduce it (it happened again as I was downloading the python2.7 docker container) and see.
TonyMac32 Posted October 17, 2017 Posted October 17, 2017 No, 4.15 doesn't exist yet, I pulled a patch that is scheduled to be included in a future kernel. I apologize for any confusion. I've moved multi-GB files with this image, that's the only reason I'm curious. the armbianmonitor link will provide basic info about the machine and dmesg info if you provide it.
NinjaKitty Posted October 17, 2017 Author Posted October 17, 2017 http://sprunge.us/ORAj Also, I've noticed with this particular version, static IPs aren't working on boot and I have to do ifdown/ifup to make it work. It used to work fine with the previous versions. I'm not sure what's wrong with that either.
Igor Posted October 17, 2017 Posted October 17, 2017 5 minutes ago, NinjaKitty said: It used to work fine with the previous versions. Which previous versions? Which kernel? Another possible related problem - not sure if it manifests here:https://github.com/armbian/build/commit/71ac70f93e4cf2af99f9ead6297f6b79f0c0529c
NinjaKitty Posted October 17, 2017 Author Posted October 17, 2017 Sorry, I should've checked before posting. ARMBIAN 5.33.171011 nightly Ubuntu 16.04.3 LTS 4.13.5-meson64 Currently, I'm on 4.13.7-meson64
TonyMac32 Posted October 17, 2017 Posted October 17, 2017 ### Group membership of dan : dan dialout sudo audio video plugdev systemd-journal netdev bluetooth docker Was docker installed at the same time the issues appeared? 21.620963] docker_gwbridge: port 1(vethe7075d6) entered forwarding state [ 21.621367] docker_gwbridge: port 1(vethe7075d6) entered disabled state [ 21.797479] eth1: renamed from vethc6ce16b I have not tried Docker on this image, @Igor could this be related?
Igor Posted October 17, 2017 Posted October 17, 2017 15 minutes ago, TonyMac32 said: I have not tried Docker on this image, @Igor could this be related? Neither did I. It could be a Docker (dependencies) related issue. Unfortunately, I am not in deep familiar with possible troubles.
NinjaKitty Posted October 17, 2017 Author Posted October 17, 2017 I've ran into this problem a few times without docker for things like sudo apt-get update, but most of the time that I've ran into this was something involved with docker. Either running the get.docker script, installing docker, or downloading a container (i.e. python2.7-slim). I can test later today and see if other workloads cause this to happen.
V10lator Posted October 19, 2017 Posted October 19, 2017 Did you comile your own kernel? Just asking as I had similar issues when compiling a custom kernel with any other CPU frequency governor than Performance. This is how it should be: $ gunzip -c /proc/config.gz | grep CPU_FREQ_GOV CONFIG_CPU_FREQ_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_GOV_POWERSAVE is not set # CONFIG_CPU_FREQ_GOV_USERSPACE is not set # CONFIG_CPU_FREQ_GOV_ONDEMAND is not set # CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set
NinjaKitty Posted October 20, 2017 Author Posted October 20, 2017 @V10lator Nope, just downloaded from armbian's site. Mine looks like this CONFIG_CPU_FREQ_GOV_ATTR_SET=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=m CONFIG_CPU_FREQ_GOV_USERSPACE=m CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
NinjaKitty Posted October 20, 2017 Author Posted October 20, 2017 I'm going to uninstall docker-ce and just use them and see if I still crash. For my other question of not being able to access the internet on boot, am I doing something wrong here? /etc/network/interfaces # Wired adapter #1 allow-hotplug eth0 no-auto-down eth0 iface eth0 inet static address 192.168.5.200 netmask 255.255.255.0 gateway 192.168.5.6 dns-nameservers 8.8.8.8 192.168.5.6 For more context, I have it setup to static IP, but every time on boot, i have to do sudo ifdown eth0; sudo ifup eth0 every single time to get internet access. I can SSH it just fine, and I can ping 8.8.8.8, but I can't resolve any hostnames like nslookup google.com Edit: I uninstalled docker-ce and so far, it hasn't crashed. I've done some stress tests, iperfs, and it seems to be working fine... Will keep checking. Maybe something wrong with docker with amlogic?
V10lator Posted October 20, 2017 Posted October 20, 2017 4 hours ago, NinjaKitty said: Mine looks like this CONFIG_CPU_FREQ_GOV_ATTR_SET=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=m CONFIG_CPU_FREQ_GOV_USERSPACE=m CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y That's weird and might be why you see these errors. Could you check the outut of cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor and if it says anything other than performance do sudo echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor (please note that this setting won't survive a reboot) and then redo your testing? BTW: Is wget https://git.kernel.org/torvalds/t/linux-4.14-rc5.tar.gz a good way to reproduce the issue for you? 1 hour ago, NinjaKitty said: For my other question of not being able to access the internet on boot, am I doing something wrong here? What's ifconfig and cat /etc/resolv.conf telling before and after you do ifdown/ifup?
NinjaKitty Posted October 20, 2017 Author Posted October 20, 2017 9 hours ago, V10lator said: That's weird and might be why you see these errors. Could you check the outut of cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor and if it says anything other than performance do sudo echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor (please note that this setting won't survive a reboot) and then redo your testing? BTW: Is wget https://git.kernel.org/torvalds/t/linux-4.14-rc5.tar.gz a good way to reproduce the issue for you? 1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor returns ondemand 2) I'll do some testing when I get back home from work. 3) I just did the wget command, and it works fine (didn't change scaling_governor) 4) I'll try changing scaling_governor to performance and then reinstall docker-ce, and then redo the same stuff I was doing before.
NinjaKitty Posted October 26, 2017 Author Posted October 26, 2017 On 10/19/2017 at 11:33 PM, V10lator said: What's ifconfig and cat /etc/resolv.conf telling before and after you do ifdown/ifup? Before: eth0 Link encap:Ethernet HWaddr 8e:fc:0c:bb:94:16 inet addr:192.168.5.200 Bcast:192.168.5.255 Mask:255.255.255.0 inet6 addr: fe80::8cfc:cff:febb:9416/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:86915 errors:0 dropped:42 overruns:0 frame:0 TX packets:658 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13818114 (13.8 MB) TX bytes:88602 (88.6 KB) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:116878 errors:0 dropped:0 overruns:0 frame:0 TX packets:116878 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9303074 (9.3 MB) TX bytes:9303074 (9.3 MB) After eth0 Link encap:Ethernet HWaddr 8e:fc:0c:bb:94:16 inet addr:192.168.5.200 Bcast:192.168.5.255 Mask:255.255.255.0 inet6 addr: fe80::8cfc:cff:febb:9416/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:87076 errors:0 dropped:42 overruns:0 frame:0 TX packets:731 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13833694 (13.8 MB) TX bytes:96799 (96.7 KB) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:116962 errors:0 dropped:0 overruns:0 frame:0 TX packets:116962 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9309598 (9.3 MB) TX bytes:9309598 (9.3 MB)
NinjaKitty Posted October 26, 2017 Author Posted October 26, 2017 I changed /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor to performance I did curl https://get.docker.com | sh dan@lepotato1:~$ curl https://get.docker.com | sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 11070 100 11070 0 0 13792 0 --:--:-- --:--:-- --:--:-- 13785 # Executing docker install script, commit: 49ee7c1 + sudo -E sh -c apt-get update -qq >/dev/null + sudo -E sh -c apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null + sudo -E sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | apt-key add -qq - >/dev/null + sudo -E sh -c echo "deb [arch=arm64] https://download.docker.com/linux/ubuntu xenial edge" > /etc/apt/sources.list.d/docker.list + [ ubuntu = debian ] + sudo -E sh -c apt-get update -qq >/dev/null + sudo -E sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/pool/edge/arm64/docker-ce_17.10.0~ce-0~ubuntu_arm64.deb Operation too slow. Less than 10 bytes/sec transferred the last 120 seconds E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? Immediately after this, SSH hangs and becomes unaccessible. I went onto the computer directly, reset the ethernet, and did armbianmonitor -u. (Ethernet was dead and couldn't do apt-get update) http://sprunge.us/EFfg
Andro Posted November 4, 2017 Posted November 4, 2017 I have the Libre Le Potato, running the Armbian Ubuntu image [Ubunutu desktop - mainline kernel] from the download page. 2GB model, exactly as the OP has above. I see the same issue - ethernet connectivity reliably just stops after about 15 minutes or so of use, just running turbovncserver and one remote session, nothing too strenuous. Apart from that, it's a wonderful little board and Armbian on it is really excellent. So, just adding more weight to this observation by others.
TonyMac32 Posted November 4, 2017 Posted November 4, 2017 Thanks @Andro for the report, and @Da Xue for the information. I've now observed the same thing, however in my case it takes up to a day to appear, although admittedly I'm not doing server activities (I set the device up to stream an entire Youtube channel, which it did successfully overnight, setting it up where I would ping it periodically eventually *seems* to have knocked it out) It seems as though incoming requests may be to blame, rather than outgoing traffic? I'll set up a server with it and see how differently that behaves.
Andro Posted November 5, 2017 Posted November 5, 2017 Hi @TonyMac32 now confirming that I can indeed confirm that a constant stream of inbound traffic causes the ethernet to lock up after 15 minutes to an hour or so. This test is repeatable and reliable. As to how to provide logs or any further detail, I don't really know.
TonyMac32 Posted November 5, 2017 Posted November 5, 2017 Alright, Da Xue says Amlogic is looking into it, I'll build another image and see if I can get your results. (I am not doubting you, I'm just curious why I'm seeing a different behavior)
Andro Posted November 13, 2017 Posted November 13, 2017 I built the latest 5.34 Armbian image for the Le Potato and it still locks up after about an hour or so of ethernet traffic (using turbovnc), rendering this very nice little board unusable. I see others confirming the problem, but are there any people able to run this board successfully? A bit of a puzzle.
Andro Posted November 14, 2017 Posted November 14, 2017 On 11/5/2017 at 12:49 AM, Da Xue said: This is being investigated at Amlogic. Hi Da Xue, Any progress?
Da Xue Posted November 20, 2017 Posted November 20, 2017 From 91c030a615bc1bcc500cfd63d19ea5a61179f5e1 Mon Sep 17 00:00:00 2001 From: Yizhou Jiang <yizhou.jiang@amlogic.com> Date: Tue, 27 Jun 2017 14:05:12 +0800 Subject: [PATCH] PD#146205: eth: fix eth stop Change-Id: I7f8ad51dacd207a804377e71340fe15f547bbae0 Signed-off-by: Yizhou Jiang <yizhou.jiang@amlogic.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 0fe9ed86aa3f..4d58ed6d44fa 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3390,6 +3390,15 @@ static void moniter_tx_handler(struct work_struct *work) priv = netdev_priv(c_phy_dev->attached_dev); if (priv) { if (c_phy_dev->link) { + if (priv->dev->stats.tx_packets > 100) { + if (priv->dev->stats.rx_packets == 0) { + pr_info("rx stop, recover eth\n"); + stmmac_release(priv->dev); + stmmac_open(priv->dev); + } + } + priv->dev->stats.tx_packets++; + priv->dev->stats.tx_packets++; if (priv->dirty_tx != priv->cur_tx && check_tx == 0) { pr_info("tx queueing\n"); -- 2.13.6 This is a patch to the stmmac as a workaround to the issue for now. Hopefully a better fix is coming. @Andro
TonyMac32 Posted November 20, 2017 Posted November 20, 2017 Thanks @Da Xue. If no one gets to it before I do, I'll test it this this evening (Eastern US)
Recommended Posts