0
sfx2000

Placemaker - H5 crashing under SMP load

Recommended Posts

Armbianmonitor:

Just wanted to note this - target NanoPi NEO2 - task at hand is Byte-UnixBench....

 

https://github.com/sfx2000/byte-unixbench

 

more to follow...

 

Spoiler

 2708.074105] Unable to handle kernel paging request at virtual address f7ff8000358ddbf0

[ 2708.082298] Mem abort info:

[ 2708.085153]   ESR = 0x96000004

[ 2708.088834]   Exception class = DABT (current EL), IL = 32 bits

[ 2708.094826]   SET = 0, FnV = 0

[ 2708.097899]   EA = 0, S1PTW = 0

[ 2708.101086] Data abort info:

[ 2708.104010]   ISV = 0, ISS = 0x00000004

[ 2708.107886]   CM = 0, WnR = 0

[ 2708.110879] [f7ff8000358ddbf0] address between user and kernel address ranges

[ 2708.118043] Internal error: Oops: 96000004 [#1] SMP

[ 2708.122918] Modules linked in: zram sun8i_codec_analog snd_soc_simple_card sun8i_adda_pr_regmap snd_soc_simple_card_utils sun4i_i2s snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore lima gpu_sched sun4i_gpadc_iio industrialio cpufreq_dt usb_f_acm u_serial g_serial libcomposite realtek dwmac_sun8i i2c_mv64xxx mdio_mux

[ 2708.152127] CPU: 3 PID: 28677 Comm: sort Not tainted 5.3.9-sunxi64 #19.11.3

[ 2708.159080] Hardware name: FriendlyARM NanoPi NEO 2 (DT)

[ 2708.164389] pstate: 60000005 (nZCv daif -PAN -UAO)

[ 2708.169187] pc : unlink_file_vma+0x1c/0x58

[ 2708.173283] lr : free_pgtables+0xe4/0x138

[ 2708.177288] sp : ffff000017e4bc60

[ 2708.180599] x29: ffff000017e4bc60 x28: ffff800035c50d00 

[ 2708.185908] x27: 0000000000000000 x26: 0000000000000000 

[ 2708.191216] x25: 0000000056000000 x24: 0000000000000000 

[ 2708.196524] x23: 0000000000000000 x22: ffff000017e4bd08 

[ 2708.201832] x21: 0000ffff98ac4000 x20: f7ff8000358ddb00 

[ 2708.207140] x19: ffff8000354503e8 x18: 0000000000000000 

[ 2708.212448] x17: 0000000000000000 x16: 0000000000000000 

[ 2708.217756] x15: 0000000000000000 x14: 0000000000000000 

[ 2708.223064] x13: 0000000000000000 x12: ffff800037ddb840 

[ 2708.228372] x11: 0000000000000000 x10: ffff000010e761ce 

[ 2708.233679] x9 : 0000000000000000 x8 : ffff800035ae1440 

[ 2708.238988] x7 : 0000ffffb615c000 x6 : 00000000002f6749 

[ 2708.244295] x5 : 0000000000000000 x4 : 0000000000000000 

[ 2708.249603] x3 : 0000000000000001 x2 : ffffffffffffffff 

[ 2708.254911] x1 : 0000000000000002 x0 : ffff8000354503e8 

[ 2708.260219] Call trace:

[ 2708.262666]  unlink_file_vma+0x1c/0x58

[ 2708.266413]  free_pgtables+0xe4/0x138

[ 2708.270074]  exit_mmap+0xd4/0x160

[ 2708.273390]  mmput+0x60/0x150

[ 2708.276356]  do_exit+0x330/0xa88

[ 2708.279581]  do_group_exit+0x34/0xd0

[ 2708.283153]  __arm64_sys_exit_group+0x14/0x18

[ 2708.287511]  el0_svc_common.constprop.0+0x88/0x150

[ 2708.292300]  el0_svc_handler+0x20/0x80

[ 2708.296048]  el0_svc+0x8/0xc

[ 2708.298932] Code: f9405014 b40001d4 a9025bf5 aa0003f3 (f9407a96) 

[ 2708.305022] ---[ end trace 57dc40848a0a1b21 ]---

[ 2708.309668] Fixing recursive fault but reboot is needed!

 

Share this post


Link to post
Share on other sites
(edited)

Hi @sfx2000, if this is reproducible, please try removing "cpu-clock-1.3GHz-1.3v" from your armbianEnv.txt "overlays=" line, and see if the problem still occurs.  If it does not, then pushing to 1.3GHz at 1.3v may be too much for your board...

 

I've attached a new test overlay to this post that enables 1.2GHz at 1.3v; let me know if it fixes the problem (assuming that the 1.3GHz clock is the problem :wacko:).  If this works then I can add it to the mainline.

sun50i-h5-cpu-clock-1.2GHz-1.3v.dtbo

Edited by 5kft
(attached 1.2GHz overlay)

Share this post


Link to post
Share on other sites

been on biz travel over the last few days - jobby job stuff

 

Rolled back the overlay that was in place, and Neo2 is again stable.

 

I'll check out the update for the overlay...

 

I'm in recovery mode - 3 cities in 4 days across the USA - more time in planes than in meetings,

Share this post


Link to post
Share on other sites
13 hours ago, sfx2000 said:

been on biz travel over the last few days - jobby job stuff

 

I'm in recovery mode - 3 cities in 4 days across the USA - more time in planes than in meetings,

 

...ugh...I can relate...  But it's so rewarding though, right??  ;)

 

13 hours ago, sfx2000 said:

Rolled back the overlay that was in place, and Neo2 is again stable.

 

I'll check out the update for the overlay...

 

Ah...then the overlay should help - keep me posted.  Thanks!

Share this post


Link to post
Share on other sites
13 hours ago, 5kft said:

Ah...then the overlay should help - keep me posted.  Thanks!

 

Installed the overlay...

 

Spoiler

sfx@nano2:~$ cat /boot/armbianEnv.txt
verbosity=1
console=both
overlay_prefix=sun50i-h5
overlays=gpio-regulator-1.3v i2c0 usbhost1 usbhost2 cpu-clock-1.2GHz-1.3v
rootdev=UUID=ee092379-cddf-4aff-beb3-7cc62d0fe9bd
rootfstype=ext4
extraargs=net.ifnames=0
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u

 

 

sfx@nano2:~$ ls -l /sys/class/leds
total 0
lrwxrwxrwx 1 root root 0 Jan  1  1970 nanopi:green:status -> ../../devices/platform/leds/leds/nanopi:green:status
lrwxrwxrwx 1 root root 0 Jan  1  1970 nanopi:red:pwr -> ../../devices/platform/leds/leds/nanopi:red:pwr

 

 

sfx@nano2:~$ cat /etc/default/cpufrequtils
# WARNING: this file will be replaced on board support package (linux-root-...) upgrade
ENABLE=true
MIN_SPEED=480000
#MIN_SPEED=120000
#MAX_SPEED=1000000
MAX_SPEED=1200000
GOVERNOR=schedutil

 

sfx@nano2:~$ cpufreq-info -c1

cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009

Report errors and bugs to cpufreq@vger.kernel.org, please.

analyzing CPU 1:

  driver: cpufreq-dt

  CPUs which run at the same hardware frequency: 0 1 2 3

  CPUs which need to have their frequency coordinated by software: 0 1 2 3

  maximum transition latency: 5.44 ms.

  hardware limits: 480 MHz - 1.20 GHz

  available frequency steps: 480 MHz, 648 MHz, 816 MHz, 960 MHz, 1.01 GHz, 1.10 GHz, 1.20 GHz

  available cpufreq governors: conservative, userspace, powersave, ondemand, performance, schedutil

  current policy: frequency should be within 480 MHz and 1.20 GHz.

                  The governor "schedutil" may decide which speed to use

                  within this range.

  current CPU frequency is 816 MHz (asserted by call to hardware).

  cpufreq stats: 480 MHz:46.69%, 648 MHz:11.69%, 816 MHz:25.12%, 960 MHz:4.96%, 1.01 GHz:0.80%, 1.10 GHz:1.19%, 1.20 GHz:9.55%  (3952)

 

Outcome not good when putting the board with the new overlay when stressing it with 

openssl speed -multi 4

Board crashes with a hard hangup - have to pull power.

 

Should note this is a v1.1 board with the , which I didn't mention earlier.. This is the kit with the OLED hat and the cute little aluminum case...

Share this post


Link to post
Share on other sites
20 hours ago, sfx2000 said:

Outcome not good when putting the board with the new overlay when stressing it

 

OK, thanks for the info!  I ran the openssl test on a number of boards (two NEO2 v1.1s - one 512MB, one 1GB; also a modified Orange Pi Zero Plus2 H5).  I was able to get it to consistently crash at 1.3GHz on one of the NEO2s, and reducing it to 1.2GHz with the overlay worked (I let it run for 5-10 min on each test).  The other NEO2 worked OK at 1.3GHz; I didn't get enough run time on the Orange Pi Plus2 because it would keep overheating (critical shutdown at 100C).  I ran the tests a few times and the behavior was consistent.  I need to dig up a bigger heatsink/temporary fan for the Orange Pi to really test it...I have a number of other boards I could test this on, but I have to install heatsinks on them...

 

Given that the 1.2GHz overlay solved the problem for my failing NEO2 (reproducible), I think that I'll go ahead and add it to the mainline.  Using the overlay eliminates the need to edit /etc/default/cpufrequtils, just add the overlay to /boot/armbianEnv.txt.

 

Unfortunately overclocking is pretty much "luck of the draw" in terms of the CPU...if your board is crashing at 1.2GHz/1.3v there isn't a lot we can do about it at that clockrate :(  As a test you could try using the 1.3GHz overlay and reducing the MAX_SPEED in /etc/default/cpufrequtils to 1152000000 or 1104000000 and see if those are stable?

 

 

Share this post


Link to post
Share on other sites

In case it is helpful to users, I've checked in the new 1.2GHz max overclock overlay:  https://github.com/armbian/build/commit/74c6adec7411ef4d6dfa2115d21378c84aecb488

 

Use it just like the 1.3GHz overclock overlay; no need to edit the default "/etc/default/cpufrequtils".  E.g., relevant excerpt from "/boot/armbianEnv.txt" for NEO2 v1.1:

    ...
overlay_prefix=sun50i-h5
overlays=usbhost1 usbhost2 gpio-regulator-1.3v cpu-clock-1.2GHz-1.3v
    ...

Limiting the overclock to 1.2GHz on one of my NEO2s makes it completely stable as compared to 1.3GHz (e.g., using the "openssl speed -multi 4" SMP test).  If anyone is interested, it'd be easy enough for me to add a 1.1GHz/1.3v overlay as well, just let me know :)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
0