Tinkerboard S 5.4.28-rockchip drops in network randomly

Shen En Chen · March 29, 2020

I have Tinker Board S. I have been using both Tinker OS and Armbian for a while. Recently I upgraded to 5.4.28-rockchip. I noticed that randomly and frequently when I watch cnn.com or other media, my video would freeze. Initially I thought the firefox or chromium crashes. However, later, I realized it was the network issue. Other devices were still able to connect to the internet. When I checked the ifconfig, the network devices still showed proper connection. However, there is no way to ping any URLs. Disconnecting/reconnecting network does not fix the issue. The only fix was reboot.

How can I troubleshoot this issue? I was trying to do armbianmonitor but at that time, there was no network at all.

http://ix.io/2fVR This is armbianmonitor after reboot and no problem at all. for baseline reference.

Myy · March 30, 2020

In armbianmonitor, both eth0 and wlan0 appear to be down, so how do you connect to the Internet ? Ethernet cable or wireless ?

5: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 0c:9d:92:0c:b0:ce brd ff:ff:ff:ff:ff:ff

6: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether d0:c5:d3:5a:af:db brd ff:ff:ff:ff:ff:ff

When the problem occurs, if you do (replace eth0 by wlan0 if you're connected through WiFi) :

ip link set eth0 down

ip link set eth0 up

Can you ping 1.1.1.1 ?

Also, what's the temperature of the CPU when the problem occurs ?

Shen En Chen · March 30, 2020

Ethernet cable was used. Wireless not set. At the time of uploading armbianmonitor, network was up with Ethernet. Don't know why in armbianmonitor eth0 and wlan0 were off.

When network was down, ping 1.1.1.1 was unreachable.

At this time, I do not know how to assess CPU temperature in Armbian.

ifdown and ifup did not work.

Will try the following

ip link set eth0 down

ip link set eth0 up

next time when network is down again.

Thanks!

Shen En Chen · March 31, 2020

Just had another crash. Unable to upload armbianmonitor -u, so here I attached armbianmonitor -U here.

I found the /sys/class/thermal zone for temperature. While I monitored it, the temperature was highest at 69.x C in zone0, and 65.x C in zone0.

ip link set eth0 down & ip link set eth0 up was able to bring back the network.

However, the system became very slow and nearly unusable. Then after a few minutes, the system crashed totally. I had to power off and on in order for the system to be normal.

After system reboots, the temperature is 42 C. Not sure if it is the high temperature causing the issue. Any other way to troubleshoot?

Thanks

armbianmonitor.txt

Myy · March 31, 2020

Hmm, it seems that panfrost might be the culprit. These "Purging ... bytes" come from the panfrost driver...

$ grep "Purging " * -r
drivers/s390/char/vmlogrdr.c:    * Purging has to be done as separate step, because recording
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %u vmaps\n", unmapped);
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:           pr_info("Purging GPU memory, %lu pages freed, "
drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c:               pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/scsi/ibmvscsi/ibmvfc.c: ibmvfc_dbg(vhost, "Purging all requests\n");
drivers/scsi/qla2xxx/qla_mbx.c:             "Chip Reset in progress. Purging Mbox cmd=0x%x.\n",
fs/ocfs2/dlm/dlmthread.c:       mlog(0, "%s: Purging res %.*s, master %d\n", dlm->name,
fs/coda/cache.c:/* Purging dentries and children */
tools/perf/builtin-buildid-cache.c:     pr_debug("Purging %s: %s\n", pathname, err ? "FAIL" : "Ok");

Maybe the GPU, or the GPU driver starts acting up and crashes the whole machine after a while...

However I have no clear idea on how to test that... the best way would be to load a WebGL benchmark and let it go, while unplugging the network, to be sure that it's the real culprit.

Try running glmark2-es2 benchmark for an hour maybe ? or some other OpenGL ES benchmark and see if the slow down and crash happen ?

Two of the first results for "WebGL benchmark" on Google :

http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html

https://crypt-webgl.unigine.com/game.html

Maybe the panfrost spam is just a red herring, I don't know.

Myy · March 31, 2020

By looking at the logs again, two things are sure :

- The RAM got eaten by some process, same thing for the swap.

- The CPU are overheating.

Try to fire some system monitor thingy while watching a video, and have a look at the RAM consumed. If it goes up constantly, until reaching a point where everything goes to a crawl then crashes, we know it's due to a memory leak :/ .
Given how Panfrost is invoking memory "shrinking" like crazy, I don't know if it's the real culprit, or if it just panicking.

Shen En Chen · April 1, 2020

On 3/31/2020 at 1:27 AM, Myy said:
Hmm, it seems that panfrost might be the culprit. These "Purging ... bytes" come from the panfrost driver...
$ grep "Purging " * -r
drivers/s390/char/vmlogrdr.c:    * Purging has to be done as separate step, because recording
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %u vmaps\n", unmapped);
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:           pr_info("Purging GPU memory, %lu pages freed, "
drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c:               pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/scsi/ibmvscsi/ibmvfc.c: ibmvfc_dbg(vhost, "Purging all requests\n");
drivers/scsi/qla2xxx/qla_mbx.c:             "Chip Reset in progress. Purging Mbox cmd=0x%x.\n",
fs/ocfs2/dlm/dlmthread.c:       mlog(0, "%s: Purging res %.*s, master %d\n", dlm->name,
fs/coda/cache.c:/* Purging dentries and children */
tools/perf/builtin-buildid-cache.c:     pr_debug("Purging %s: %s\n", pathname, err ? "FAIL" : "Ok");
Maybe the GPU, or the GPU driver starts acting up and crashes the whole machine after a while...

However I have no clear idea on how to test that... the best way would be to load a WebGL benchmark and let it go, while unplugging the network, to be sure that it's the real culprit.

Try running glmark2-es2 benchmark for an hour maybe ? or some other OpenGL ES benchmark and see if the slow down and crash happen ?

Two of the first results for "WebGL benchmark" on Google :

http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html

https://crypt-webgl.unigine.com/game.html

Maybe the panfrost spam is just a red herring, I don't know.

Running glmark2-es2 and http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html not efficient to make temperautre go beyound 55C.

I realize if I turning on VNC Server in the background, and watch video streaming such as cnn.com or other online video such as youtube with high quality, the chances of getting into crashes are high.

Any way to see what is eating the memory?

Thanks!

Shen En Chen · April 2, 2020

What I did today was monitoring the temperature and RAM usage every 10 seconds while watching youtube video at 1080p. And after a while, the network dropped like before. During this time, the highest temperature was 68C. Both armbianmonitor and ramtempck-20200401 attached.

while :; do file=ramtempck-20200401; touch $file; date >> $file ; temp >> $file ; free -h >> $file; sleep 10; done

Are we able to tell it is the temperature causing issues or RAM leak?

armbianmonitor.20200401.txt ramtempck-20200401

Myy · April 2, 2020

Hmm, there's no RAM leak visible on the logs... However, the panfrost driver is still spamming like crazy.

Once the network drop happens, if you down and up the interface again, the system becomes unstable again ?

Anyway, there seems to be three issues :

* Panfrost spamming

* Network drops

* Potential memory leak leading to system crashes.

I'll try to put up a network test and see if that also happen with 5.6 kernels.

Shen En Chen · April 4, 2020

@Myy yes, after down and up the interface, I have network, but the system still unstable. The network still randomly drops.

Sign In

Tinkerboard S 5.4.28-rockchip drops in network randomly

Recommended Posts

Shen En Chen

Myy

Shen En Chen

Shen En Chen

Myy

Myy

Shen En Chen

Shen En Chen

Myy

Shen En Chen

Similar Content

Forums

My Activity Streams

Download

Store

Important Information