2 2
Superkoning

Heavy load NanoPi NEO2 => lockup

Recommended Posts

Hi,

 

When I put a heavy load on my NanoPi NEO2, it locks up. The ethernet light keeps flashing, so there is still something alive? After a power reset, the NEO2 works again.

 

I've already put a CPU fan on my CPU. CPU temp is at 25 degrees Celsius in rest, and goes up to 37 degrees Celsius under the heavy load.

The heavy load is a "make -j4" of a C source code project

 

Tips how to solve this?

 

 

sander@nanopineo2:~$ while true; do date ; uptime ;  cat /etc/armbianmonitor/datasources/soctemp; sleep 2; done | tee  mijn-log.txt
Tue Aug 22 11:46:37 UTC 2017
 11:46:37 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28342
Tue Aug 22 11:46:39 UTC 2017
 11:46:39 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28584
Tue Aug 22 11:46:41 UTC 2017
 11:46:41 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28463
Tue Aug 22 11:46:43 UTC 2017
 11:46:43 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28705
Tue Aug 22 11:46:45 UTC 2017
 11:46:45 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28827
Tue Aug 22 11:46:47 UTC 2017
 11:46:47 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28463
Tue Aug 22 11:46:49 UTC 2017
 11:46:49 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
30281
Tue Aug 22 11:46:51 UTC 2017
 11:46:51 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28584
Tue Aug 22 11:46:53 UTC 2017
 11:46:53 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
30402
Tue Aug 22 11:46:55 UTC 2017
 11:46:55 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28827
Tue Aug 22 11:46:57 UTC 2017
 11:46:57 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28705
Tue Aug 22 11:46:59 UTC 2017
 11:46:59 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28705
Tue Aug 22 11:47:01 UTC 2017
 11:47:01 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28463
Tue Aug 22 11:47:03 UTC 2017
 11:47:03 up  5:24,  2 users,  load average: 0.00, 0.00, 0.00
28342
Tue Aug 22 11:47:05 UTC 2017
 11:47:05 up  5:25,  2 users,  load average: 0.00, 0.00, 0.00
29917
Tue Aug 22 11:47:07 UTC 2017
 11:47:07 up  5:25,  2 users,  load average: 0.08, 0.02, 0.01
29069
Tue Aug 22 11:47:09 UTC 2017
 11:47:09 up  5:25,  2 users,  load average: 0.08, 0.02, 0.01
33431
Tue Aug 22 11:47:11 UTC 2017
 11:47:11 up  5:25,  2 users,  load average: 0.47, 0.10, 0.03
36945
Tue Aug 22 11:47:13 UTC 2017
 11:47:14 up  5:25,  2 users,  load average: 0.47, 0.10, 0.03
37187
Tue Aug 22 11:47:25 UTC 2017

 

... and then nothing more
 

 

 

Share this post


Link to post
Share on other sites
4 hours ago, Naguissa said:

Maybe power issue. Can you check with other PSU?

Enviado desde mi Jolla mediante Tapatalk
 

I tried that, and re-tested, and it looked a bit better, but it seems the problem is different than I described: the NEO2 is still alive, it seems. Instead of a print each second, it now prints once somewhere in x minutes.

'uptime' shows a load aka waiting queue of 32 processes. 

The NEO2 is still ping-able (1-2 ms). But a new ssh connection times out.

So extremely overloaded, but still alive

 

So not a PSU problem, I would say?

 

di 22 aug 2017 17:43:42 UTC
 17:43:47 up 24 min,  2 users,  load average: 6,10, 5,01, 2,51
30523
di 22 aug 2017 17:43:56 UTC
 17:44:01 up 25 min,  2 users,  load average: 6,14, 5,07, 2,58
30281
di 22 aug 2017 17:44:07 UTC
 17:44:12 up 25 min,  2 users,  load average: 6,12, 5,10, 2,61
30402
di 22 aug 2017 17:44:22 UTC


 17:53:42 up 35 min,  2 users,  load average: 15,02, 12,76, 7,83
33310
di 22 aug 2017 18:03:55 UTC
 18:08:01 up 49 min,  2 users,  load average: 20,89, 19,22, 14,30
31129
di 22 aug 2017 18:08:09 UTC
 18:08:20 up 49 min,  2 users,  load average: 20,04, 19,20, 14,51
31250
di 22 aug 2017 18:10:22 UTC
 18:20:03 up  1:01,  2 users,  load average: 24,44, 23,30, 19,40
31129
di 22 aug 2017 18:27:27 UTC
 18:31:13 up  1:12,  2 users,  load average: 26,97, 26,43, 22,89
33067
di 22 aug 2017 18:40:02 UTC

 

Share this post


Link to post
Share on other sites

Update from my side:

 

With the "stress" command starting a lot of CPU processes, I can NOT get a lockup; CPU load very high, but it keeps working. So not a CPU problem after all?

 

So my next guess:  a disk problem. As a first test, I disabled my swap space. A new "make -j4" did NOT result in a lock-up, but in "virtual memory exhausted: Cannot allocate memory".  Maybe that's better than a lockup ... :-)

 

To be continued.

Share this post


Link to post
Share on other sites
Update from my side:
 
With the "stress" command starting a lot of CPU processes, I can NOT get a lockup; CPU load very high, but it keeps working. So not a CPU problem after all?
 
So my next guess:  a disk problem. As a first test, I disabled my swap space. A new "make -j4" did NOT result in a lock-up, but in "virtual memory exhausted: Cannot allocate memory".  Maybe that's better than a lockup ... :-)
 
To be continued.

Maybe, so much tasks using SD could lock SD. You could try using an USB as swap....

Enviado desde mi Jolla mediante Tapatalk

Share this post


Link to post
Share on other sites

I have the same identical problem with my OrangePi PC2 (AllWinner H5)

The swap is just too slow so the kswapd0 kernel process get all the CPU power stalling everything in unser space.

I don't know if it is just because the swap medium (SD or USB in my case) is too slow, or if there is some bugs (or wrong irq affinity)

Share this post


Link to post
Share on other sites

I've put both my swap file and my git project on the USB drive. I deactivated the swapfile on the SD card. And still my system stalls with a "make -j4".

 

I can't see anything special in the monitoring output:

- swap is only used for 14MB

- cpu is 98% idle (!), whereas cc1plus processes are using cpu cores for 70-90% each

- kswap is at 26% cpu usage ... that might be high

za 26 aug 2017  7:21:12 UTC
 07:21:12 up 1 day, 18:45,  2 users,  load average: 2,09, 1,50, 1,02
37672
              total        used        free      shared  buff/cache   available
Mem:            483         426           6           1          50          33
Swap:          1023          14        1009
top - 07:21:12 up 1 day, 18:45,  2 users,  load average: 2,09, 1,50, 1,02
Tasks: 125 total,   5 running, 120 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,1 us,  0,2 sy,  0,0 ni, 98,9 id,  0,7 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem :   495440 total,     7208 free,   450112 used,    38120 buff/cache
KiB Swap:  1048572 total,  1033972 free,    14600 used.    20772 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
30720 sander    20   0  135260 116048  10588 R  90,0 23,4   0:06.45 cc1plus
30721 sander    20   0  133224 115304  10764 R  90,0 23,3   0:06.44 cc1plus
30722 sander    20   0  131196 112388  10748 R  80,0 22,7   0:06.19 cc1plus
30719 sander    20   0  133228 115736  10720 R  75,0 23,4   0:06.41 cc1plus
   94 root      20   0       0      0      0 S  25,0  0,0  14:22.82 kswapd0
30763 sander    20   0    7352   3108   2648 R  10,0  0,6   0:00.04 top
    8 root      20   0       0      0      0 S   5,0  0,0   0:02.97 rcu_sched
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   94 root      20   0       0      0      0 R  26,3  0,0  14:22.87 kswapd0

 

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
2 2