Jump to content

Tinkerboard S 5.4.28-rockchip drops in network randomly


Shen En Chen

Recommended Posts

I have Tinker Board S. I have been using both Tinker OS and Armbian for a while. Recently I upgraded to 5.4.28-rockchip. I noticed that randomly and frequently when I watch cnn.com or other media, my video would freeze. Initially I thought the firefox or chromium crashes. However, later, I realized it was the network issue. Other devices were still able to connect to the internet. When I checked the ifconfig, the network devices still showed proper connection. However, there is no way to ping any URLs. Disconnecting/reconnecting network does not fix the issue. The only fix was reboot.

 

How can I troubleshoot this issue? I was trying to do armbianmonitor but at that time, there was no network at all.

 

http://ix.io/2fVR This is armbianmonitor after reboot and no problem at all.  for baseline reference. 

Link to comment
Share on other sites

In armbianmonitor, both eth0 and wlan0  appear to be down, so how do you connect to the Internet ? Ethernet cable or wireless ?

 

5: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 0c:9d:92:0c:b0:ce brd ff:ff:ff:ff:ff:ff

6: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether d0:c5:d3:5a:af:db brd ff:ff:ff:ff:ff:ff

 

When the problem occurs, if you do (replace eth0 by wlan0 if you're connected through WiFi) :

 

ip link set eth0 down

ip link set eth0 up

 

Can you ping 1.1.1.1 ?

 

Also, what's the temperature of the CPU when the problem occurs ?

Link to comment
Share on other sites

Ethernet cable was used. Wireless not set. At the time of uploading armbianmonitor, network was up with Ethernet. Don't know why in armbianmonitor eth0 and wlan0 were off. 

 

When network was down, ping 1.1.1.1 was unreachable. 

 

At this time, I do not know how to assess CPU temperature in Armbian. 

 

ifdown and ifup did not work.

 

Will try the following

  ip link set eth0 down

  ip link set eth0 up

next time when network is down again.

 

Thanks!

Link to comment
Share on other sites

Just had another crash. Unable to upload armbianmonitor -u, so here I attached armbianmonitor -U here. 

 

I found the /sys/class/thermal zone for temperature. While I monitored it, the temperature was highest at 69.x C in zone0, and 65.x C in zone0. 

 

ip link set eth0 down & ip link set eth0 up was able to bring back the network. 

 

However, the system became very slow and nearly unusable. Then after a few minutes, the system crashed totally. I had to power off and on in order for the system to be normal.

 

After system reboots, the temperature is 42 C.  Not sure if it is the high temperature causing the issue. Any other way to troubleshoot?

 

Thanks

 

armbianmonitor.txt

Link to comment
Share on other sites

Hmm, it seems that panfrost might be the culprit. These "Purging ... bytes" come from the panfrost driver...

$ grep "Purging " * -r
drivers/s390/char/vmlogrdr.c:    * Purging has to be done as separate step, because recording
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %u vmaps\n", unmapped);
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:           pr_info("Purging GPU memory, %lu pages freed, "
drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c:               pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/scsi/ibmvscsi/ibmvfc.c: ibmvfc_dbg(vhost, "Purging all requests\n");
drivers/scsi/qla2xxx/qla_mbx.c:             "Chip Reset in progress. Purging Mbox cmd=0x%x.\n",
fs/ocfs2/dlm/dlmthread.c:       mlog(0, "%s: Purging res %.*s, master %d\n", dlm->name,
fs/coda/cache.c:/* Purging dentries and children */
tools/perf/builtin-buildid-cache.c:     pr_debug("Purging %s: %s\n", pathname, err ? "FAIL" : "Ok");

Maybe the GPU, or the GPU driver starts acting up and crashes the whole machine after a while...

 

However I have no clear idea on how to test that... the best way would be to load a WebGL benchmark and let it go, while unplugging the network, to be sure that it's the real culprit.

 

Try running glmark2-es2 benchmark for an hour maybe ? or some other OpenGL ES benchmark and see if the slow down and crash happen ?

 

Two of the first results for "WebGL benchmark" on Google :

http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html

https://crypt-webgl.unigine.com/game.html

 

Maybe the panfrost spam is just a red herring, I don't know.

Link to comment
Share on other sites

By looking at the logs again, two things are sure :

 

- The RAM got eaten by some process, same thing for the swap.

- The CPU are overheating.

 

Try to fire some system monitor thingy while watching a video, and have a look at the RAM consumed. If it goes up constantly, until reaching a point where everything goes to a crawl then crashes, we know it's due to a memory leak :/ .
Given how Panfrost is invoking memory "shrinking"
like crazy, I don't know if it's the real culprit, or if it just panicking.

Link to comment
Share on other sites

On 3/31/2020 at 1:27 AM, Myy said:

Hmm, it seems that panfrost might be the culprit. These "Purging ... bytes" come from the panfrost driver...


$ grep "Purging " * -r
drivers/s390/char/vmlogrdr.c:    * Purging has to be done as separate step, because recording
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/gpu/drm/msm/msm_gem_shrinker.c:         pr_info_ratelimited("Purging %u vmaps\n", unmapped);
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:           pr_info("Purging GPU memory, %lu pages freed, "
drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c:               pr_info_ratelimited("Purging %lu bytes\n", freed << PAGE_SHIFT);
drivers/scsi/ibmvscsi/ibmvfc.c: ibmvfc_dbg(vhost, "Purging all requests\n");
drivers/scsi/qla2xxx/qla_mbx.c:             "Chip Reset in progress. Purging Mbox cmd=0x%x.\n",
fs/ocfs2/dlm/dlmthread.c:       mlog(0, "%s: Purging res %.*s, master %d\n", dlm->name,
fs/coda/cache.c:/* Purging dentries and children */
tools/perf/builtin-buildid-cache.c:     pr_debug("Purging %s: %s\n", pathname, err ? "FAIL" : "Ok");

Maybe the GPU, or the GPU driver starts acting up and crashes the whole machine after a while...

 

However I have no clear idea on how to test that... the best way would be to load a WebGL benchmark and let it go, while unplugging the network, to be sure that it's the real culprit.

 

Try running glmark2-es2 benchmark for an hour maybe ? or some other OpenGL ES benchmark and see if the slow down and crash happen ?

 

Two of the first results for "WebGL benchmark" on Google :

http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html

https://crypt-webgl.unigine.com/game.html

 

Maybe the panfrost spam is just a red herring, I don't know.

 

Running glmark2-es2 and http://luic.github.io/WebGL-Performance-Benchmark/benchmark/cubes.html not efficient to make temperautre go beyound 55C. 

 

I realize if I turning on VNC Server in the background, and watch video streaming such as cnn.com or other online video such as youtube with high quality, the chances of getting into crashes are high. 

 

Any way to see what is eating the memory?

 

Thanks!

Link to comment
Share on other sites

What I did today was monitoring the temperature and RAM usage every 10 seconds while watching youtube video at 1080p.  And after a while, the network dropped like before. During this time, the highest temperature was 68C.  Both armbianmonitor and ramtempck-20200401 attached.

 

while :; do file=ramtempck-20200401; touch $file; date >> $file ; temp >> $file ; free -h >> $file; sleep 10; done
 

Are we able to tell it is the temperature causing issues or RAM leak?

armbianmonitor.20200401.txt ramtempck-20200401

Link to comment
Share on other sites

Hmm, there's no RAM leak visible on the logs... However, the panfrost driver is still spamming like crazy.

Once the network drop happens, if you down and up the interface again, the system becomes unstable again ?

 

Anyway, there seems to be three issues :

* Panfrost spamming

* Network drops

* Potential memory leak leading to system crashes.

 

I'll try to put up a network test and see if that also happen with 5.6 kernels.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines