sfx2000 Posted February 15, 2020 Share Posted February 15, 2020 Armbianmonitor: http://ix.io/2bL9 Just wanted to note this - target NanoPi NEO2 - task at hand is Byte-UnixBench.... https://github.com/sfx2000/byte-unixbench more to follow... Spoiler 2708.074105] Unable to handle kernel paging request at virtual address f7ff8000358ddbf0 [ 2708.082298] Mem abort info: [ 2708.085153] ESR = 0x96000004 [ 2708.088834] Exception class = DABT (current EL), IL = 32 bits [ 2708.094826] SET = 0, FnV = 0 [ 2708.097899] EA = 0, S1PTW = 0 [ 2708.101086] Data abort info: [ 2708.104010] ISV = 0, ISS = 0x00000004 [ 2708.107886] CM = 0, WnR = 0 [ 2708.110879] [f7ff8000358ddbf0] address between user and kernel address ranges [ 2708.118043] Internal error: Oops: 96000004 [#1] SMP [ 2708.122918] Modules linked in: zram sun8i_codec_analog snd_soc_simple_card sun8i_adda_pr_regmap snd_soc_simple_card_utils sun4i_i2s snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore lima gpu_sched sun4i_gpadc_iio industrialio cpufreq_dt usb_f_acm u_serial g_serial libcomposite realtek dwmac_sun8i i2c_mv64xxx mdio_mux [ 2708.152127] CPU: 3 PID: 28677 Comm: sort Not tainted 5.3.9-sunxi64 #19.11.3 [ 2708.159080] Hardware name: FriendlyARM NanoPi NEO 2 (DT) [ 2708.164389] pstate: 60000005 (nZCv daif -PAN -UAO) [ 2708.169187] pc : unlink_file_vma+0x1c/0x58 [ 2708.173283] lr : free_pgtables+0xe4/0x138 [ 2708.177288] sp : ffff000017e4bc60 [ 2708.180599] x29: ffff000017e4bc60 x28: ffff800035c50d00 [ 2708.185908] x27: 0000000000000000 x26: 0000000000000000 [ 2708.191216] x25: 0000000056000000 x24: 0000000000000000 [ 2708.196524] x23: 0000000000000000 x22: ffff000017e4bd08 [ 2708.201832] x21: 0000ffff98ac4000 x20: f7ff8000358ddb00 [ 2708.207140] x19: ffff8000354503e8 x18: 0000000000000000 [ 2708.212448] x17: 0000000000000000 x16: 0000000000000000 [ 2708.217756] x15: 0000000000000000 x14: 0000000000000000 [ 2708.223064] x13: 0000000000000000 x12: ffff800037ddb840 [ 2708.228372] x11: 0000000000000000 x10: ffff000010e761ce [ 2708.233679] x9 : 0000000000000000 x8 : ffff800035ae1440 [ 2708.238988] x7 : 0000ffffb615c000 x6 : 00000000002f6749 [ 2708.244295] x5 : 0000000000000000 x4 : 0000000000000000 [ 2708.249603] x3 : 0000000000000001 x2 : ffffffffffffffff [ 2708.254911] x1 : 0000000000000002 x0 : ffff8000354503e8 [ 2708.260219] Call trace: [ 2708.262666] unlink_file_vma+0x1c/0x58 [ 2708.266413] free_pgtables+0xe4/0x138 [ 2708.270074] exit_mmap+0xd4/0x160 [ 2708.273390] mmput+0x60/0x150 [ 2708.276356] do_exit+0x330/0xa88 [ 2708.279581] do_group_exit+0x34/0xd0 [ 2708.283153] __arm64_sys_exit_group+0x14/0x18 [ 2708.287511] el0_svc_common.constprop.0+0x88/0x150 [ 2708.292300] el0_svc_handler+0x20/0x80 [ 2708.296048] el0_svc+0x8/0xc [ 2708.298932] Code: f9405014 b40001d4 a9025bf5 aa0003f3 (f9407a96) [ 2708.305022] ---[ end trace 57dc40848a0a1b21 ]--- [ 2708.309668] Fixing recursive fault but reboot is needed! Link to comment Share on other sites More sharing options...
5kft Posted February 15, 2020 Share Posted February 15, 2020 (edited) Hi @sfx2000, if this is reproducible, please try removing "cpu-clock-1.3GHz-1.3v" from your armbianEnv.txt "overlays=" line, and see if the problem still occurs. If it does not, then pushing to 1.3GHz at 1.3v may be too much for your board... I've attached a new test overlay to this post that enables 1.2GHz at 1.3v; let me know if it fixes the problem (assuming that the 1.3GHz clock is the problem ). If this works then I can add it to the mainline. sun50i-h5-cpu-clock-1.2GHz-1.3v.dtbo Edited February 15, 2020 by 5kft (attached 1.2GHz overlay) Link to comment Share on other sites More sharing options...
sfx2000 Posted February 20, 2020 Author Share Posted February 20, 2020 been on biz travel over the last few days - jobby job stuff Rolled back the overlay that was in place, and Neo2 is again stable. I'll check out the update for the overlay... I'm in recovery mode - 3 cities in 4 days across the USA - more time in planes than in meetings, Link to comment Share on other sites More sharing options...
5kft Posted February 20, 2020 Share Posted February 20, 2020 13 hours ago, sfx2000 said: been on biz travel over the last few days - jobby job stuff I'm in recovery mode - 3 cities in 4 days across the USA - more time in planes than in meetings, ...ugh...I can relate... But it's so rewarding though, right?? 13 hours ago, sfx2000 said: Rolled back the overlay that was in place, and Neo2 is again stable. I'll check out the update for the overlay... Ah...then the overlay should help - keep me posted. Thanks! Link to comment Share on other sites More sharing options...
sfx2000 Posted February 21, 2020 Author Share Posted February 21, 2020 13 hours ago, 5kft said: Ah...then the overlay should help - keep me posted. Thanks! Installed the overlay... Spoiler sfx@nano2:~$ cat /boot/armbianEnv.txt verbosity=1 console=both overlay_prefix=sun50i-h5 overlays=gpio-regulator-1.3v i2c0 usbhost1 usbhost2 cpu-clock-1.2GHz-1.3v rootdev=UUID=ee092379-cddf-4aff-beb3-7cc62d0fe9bd rootfstype=ext4 extraargs=net.ifnames=0 usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u sfx@nano2:~$ ls -l /sys/class/leds total 0 lrwxrwxrwx 1 root root 0 Jan 1 1970 nanopi:green:status -> ../../devices/platform/leds/leds/nanopi:green:status lrwxrwxrwx 1 root root 0 Jan 1 1970 nanopi:red:pwr -> ../../devices/platform/leds/leds/nanopi:red:pwr sfx@nano2:~$ cat /etc/default/cpufrequtils # WARNING: this file will be replaced on board support package (linux-root-...) upgrade ENABLE=true MIN_SPEED=480000 #MIN_SPEED=120000 #MAX_SPEED=1000000 MAX_SPEED=1200000 GOVERNOR=schedutil sfx@nano2:~$ cpufreq-info -c1 cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report errors and bugs to cpufreq@vger.kernel.org, please. analyzing CPU 1: driver: cpufreq-dt CPUs which run at the same hardware frequency: 0 1 2 3 CPUs which need to have their frequency coordinated by software: 0 1 2 3 maximum transition latency: 5.44 ms. hardware limits: 480 MHz - 1.20 GHz available frequency steps: 480 MHz, 648 MHz, 816 MHz, 960 MHz, 1.01 GHz, 1.10 GHz, 1.20 GHz available cpufreq governors: conservative, userspace, powersave, ondemand, performance, schedutil current policy: frequency should be within 480 MHz and 1.20 GHz. The governor "schedutil" may decide which speed to use within this range. current CPU frequency is 816 MHz (asserted by call to hardware). cpufreq stats: 480 MHz:46.69%, 648 MHz:11.69%, 816 MHz:25.12%, 960 MHz:4.96%, 1.01 GHz:0.80%, 1.10 GHz:1.19%, 1.20 GHz:9.55% (3952) Outcome not good when putting the board with the new overlay when stressing it with openssl speed -multi 4 Board crashes with a hard hangup - have to pull power. Should note this is a v1.1 board with the , which I didn't mention earlier.. This is the kit with the OLED hat and the cute little aluminum case... Link to comment Share on other sites More sharing options...
5kft Posted February 22, 2020 Share Posted February 22, 2020 20 hours ago, sfx2000 said: Outcome not good when putting the board with the new overlay when stressing it OK, thanks for the info! I ran the openssl test on a number of boards (two NEO2 v1.1s - one 512MB, one 1GB; also a modified Orange Pi Zero Plus2 H5). I was able to get it to consistently crash at 1.3GHz on one of the NEO2s, and reducing it to 1.2GHz with the overlay worked (I let it run for 5-10 min on each test). The other NEO2 worked OK at 1.3GHz; I didn't get enough run time on the Orange Pi Plus2 because it would keep overheating (critical shutdown at 100C). I ran the tests a few times and the behavior was consistent. I need to dig up a bigger heatsink/temporary fan for the Orange Pi to really test it...I have a number of other boards I could test this on, but I have to install heatsinks on them... Given that the 1.2GHz overlay solved the problem for my failing NEO2 (reproducible), I think that I'll go ahead and add it to the mainline. Using the overlay eliminates the need to edit /etc/default/cpufrequtils, just add the overlay to /boot/armbianEnv.txt. Unfortunately overclocking is pretty much "luck of the draw" in terms of the CPU...if your board is crashing at 1.2GHz/1.3v there isn't a lot we can do about it at that clockrate As a test you could try using the 1.3GHz overlay and reducing the MAX_SPEED in /etc/default/cpufrequtils to 1152000000 or 1104000000 and see if those are stable? Link to comment Share on other sites More sharing options...
5kft Posted February 22, 2020 Share Posted February 22, 2020 In case it is helpful to users, I've checked in the new 1.2GHz max overclock overlay: https://github.com/armbian/build/commit/74c6adec7411ef4d6dfa2115d21378c84aecb488 Use it just like the 1.3GHz overclock overlay; no need to edit the default "/etc/default/cpufrequtils". E.g., relevant excerpt from "/boot/armbianEnv.txt" for NEO2 v1.1: ... overlay_prefix=sun50i-h5 overlays=usbhost1 usbhost2 gpio-regulator-1.3v cpu-clock-1.2GHz-1.3v ... Limiting the overclock to 1.2GHz on one of my NEO2s makes it completely stable as compared to 1.3GHz (e.g., using the "openssl speed -multi 4" SMP test). If anyone is interested, it'd be easy enough for me to add a 1.1GHz/1.3v overlay as well, just let me know Link to comment Share on other sites More sharing options...
sfx2000 Posted February 29, 2020 Author Share Posted February 29, 2020 (edited) self deleted for brevity/clarity... Edited February 29, 2020 by sfx2000 thread management Link to comment Share on other sites More sharing options...
sfx2000 Posted February 29, 2020 Author Share Posted February 29, 2020 ok - so the stock FA image also crashes on the same test - it behaves differently than the Armbian image, as it kills off the threads when it tries to do a privileged memory access... since we're working with armv8-a, we have kernel space (EL1) and user space (EL0) - hence the data abort, as the memory is marked as EL1, and an EL0 task cannot access that. I think that overclocking the CPU exposes a bug that is latent, even without the overlay, and this goes not to device tree, but to uboot and DDR ram init vectors there. The stress test (openssl) can show the bug, but this isn't the real problem, and the overlay just enables it to happen faster - getting board temp to around 60c, which on a small board like this, includes not only the SoC, but the DDR, can accelerate this issue, as some DDR can get a bit unstable at that temp. I don't have much time right now to debug further, as I'm in the middle of sfx's North America Tour - last week Austin, TX, next week Miami, FL, Atlanta, GA, Denver, CO, and a short trip to Salt Lake City, UT - about a week of downtime in the SAN, then back to Austin for a week. @5kft -- Gnarly problem to sort, eh? But spending time might help other AW H5 targets... @Igor -- something to watch maybe 2 Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 @sfx2000 - very nice sleuthing! I've spent some time with this, and was able to repro the crash consistently by dialing the DDR clock up a bit. I dialed the clock down and have been testing, and it is completely stable, even when overclocked - the openssl test can run to completion now, multiple times. (Note: I tested all of this on my "problematic" NEO2, which can only support a maximum overclock to 1.2GHz.) @Igor - it seems quite clear that the default 624MHz DDR clock is too high for this board. I've bisected the rates - e.g., 576MHz works great where 624MHz would crash intermittently. Now 576MHz may not actually be low enough, but at this point I think I'm going to go ahead and lower the H5 clocks to 576MHz (for the boards I have and can test), which will be more stable than the current 624MHz - please let me know if you disagree. The real driver for this is that even without overclocking sfx2000's board crashes with this test. Also, obviously is variance in boards/DDR...I'm also going to do some more research and testing and see if there isn't a more scientific process that we can use to determine the best DDR clock rate to use here... Link to comment Share on other sites More sharing options...
lanefu Posted March 1, 2020 Share Posted March 1, 2020 32 minutes ago, 5kft said: Also, obviously is variance in boards/DDR...I'm also going to do some more research and testing and see if there isn't a more scientific process that we can use to determine the best DDR clock rate to use here... I'm interested in this as my Opi Prime has been bulletproof for months with the existing configs.. Let me know how I can help test..I also have a PC2 that i can use for testing Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 14 minutes ago, lanefu said: I'm interested in this as my Opi Prime has been bulletproof for months with the existing configs.. Let me know how I can help test..I also have a PC2 that i can use for testing Great! It'd be interesting for you to try "openssl speed -multi 4" on your Opi Prime, and see if it makes it all the way through the run successfully without crashing (make sure you have sufficient cooling, and patience ). Personally I want a completely stable platform, which would mean take the most conservative route (e.g., possibly reduce the DDR clock even lower). I just got a crash on one board at 1.2GHz/576MHz, but it works fine if it isn't overclocked. However, if the overclock is just exposing a latent issue (as per @sfx2000's comment above), then we'd need to go lower... Any/all thoughts are appreciated regarding this...! Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 There are a number of other threads on this (and other forums) about this subject...I thought that this one was interesting: https://groups.google.com/forum/#!topic/linux-sunxi/coQGctAipgI (Icenowy patch to take it to 504MHz to match FA BSP) Also, this has some interesting history: Link to comment Share on other sites More sharing options...
sfx2000 Posted March 1, 2020 Author Share Posted March 1, 2020 5 hours ago, 5kft said: Personally I want a completely stable platform, which would mean take the most conservative route (e.g., possibly reduce the DDR clock even lower). I just got a crash on one board at 1.2GHz/576MHz, but it works fine if it isn't overclocked. However, if the overclock is just exposing a latent issue (as per @sfx2000's comment above), then we'd need to go lower... Any/all thoughts are appreciated regarding this...! Looks like Sunxi-mainline-kernel-4.14 is 504 MHz for the DDR clock for H5 on Neo2 http://wiki.friendlyarm.com/wiki/images/a/af/Sunxi-mainline-kernel-4.14-features.xlsx Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 34 minutes ago, sfx2000 said: Looks like Sunxi-mainline-kernel-4.14 is 504 MHz for the DDR clock for H5 on Neo2 Yes - the posts I mention above have some history regarding this (e.g., patch to bring the mainline 672 down to the FA 504). It looks like there was an original desire to run it at 504, then at some point in late 2017 people started overclocking, and it seemed to work... On my NEO2s, I couldn't get any failures at the default CPU clock rate (1.0GHz) at 576. I'd like to go conservative here for stability, but I'm not sure how people would feel going to 504 if 576 works fine. Can you confirm the clocks for the FA firmware that you tested with? It's worrisome if you could repro and it's at DDR 504/CPU 1.0GHz. BTW, I hammered my NEO2 Blacks running with the default (624), and they work without any issue - default clock goes to 1.37GHz. These boards are different as they use 32-bit DDR (two parts). Link to comment Share on other sites More sharing options...
sfx2000 Posted March 1, 2020 Author Share Posted March 1, 2020 5 minutes ago, 5kft said: Yes - the posts I mention above have some history regarding this (e.g., patch to bring the mainline 672 down to the FA 504). It looks like there was an original desire to run it at 504, then at some point in late 2017 people started overclocking, and it seemed to work... On my NEO2s, I couldn't get any failures at the default clock rate (1.0GHz) at 576. I'd like to go conservative here for stability, but I'm not sure how people would feel going to 504 if 576 works fine. Can you confirm the clocks for the FA firmware that you tested with? It's worrisome if you could repro and it's at DDR 504/CPU 1.0GHz. BTW, I hammered my NEO2 Blacks running with the default (624), and they work without any issue - default clock goes to 1.37GHz. These boards are different as they use 32-bit DDR (two parts). I did the FA image as a quick test, and then overwrote that card with the current Armbian to get a baseline - as my armbian card is a few months old (custom work for the oled hat, other things) IIRC, FA was pushing CPU up to 1.2, but didn't note the DDR clocks. Anyways, link to that image is in the thread I agree that stability is better than overall performance - stability is it's own performance benchmark. I'd rather err on the side of safety. Neo2 Black - different board, eh? Link to comment Share on other sites More sharing options...
sfx2000 Posted March 1, 2020 Author Share Posted March 1, 2020 16 minutes ago, 5kft said: Yes - the posts I mention above have some history regarding this (e.g., patch to bring the mainline 672 down to the FA 504). It looks like there was an original desire to run it at 504, then at some point in late 2017 people started overclocking, and it seemed to work... Seems like it doesn't under certain loads... Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 2 minutes ago, sfx2000 said: I agree that stability is better than overall performance - stability is it's own performance benchmark. I'd rather err on the side of safety. Agreed. @Igor, @martinayotte, @lanefu - apologies for the spam, but am looking for your thoughts...should we drop back to the FA mainline u-boot DDR clock rate for the NEO2, NEO Plus2, etc.? 5 minutes ago, sfx2000 said: Neo2 Black - different board, eh? Yeah, the Black is great - it's a NEO Core2 base with a better regulator, plus eMMC. I use a number of these now. I like these boards because they are so tiny, low power, and the performance is great The little metal cases are awesome too. Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 3 minutes ago, sfx2000 said: Seems like it doesn't under certain loads... Exactly... Thanks again for looking further into this! Link to comment Share on other sites More sharing options...
lanefu Posted March 1, 2020 Share Posted March 1, 2020 1 minute ago, 5kft said: thoughts...should we drop back to the FA mainline u-boot DDR clock rate for the NEO2, NEO Plus2, etc.? If we adjusted would it be possible to ship overlays for the "unstable" speeds? Link to comment Share on other sites More sharing options...
5kft Posted March 1, 2020 Share Posted March 1, 2020 2 minutes ago, lanefu said: If we adjusted would it be possible to ship overlays for the "unstable" speeds? Unfortunately not - the DDR clocks are set in u-boot... However users could feel free to build their own u-boots that use higher DDR clocks. Link to comment Share on other sites More sharing options...
sfx2000 Posted March 1, 2020 Author Share Posted March 1, 2020 23 minutes ago, 5kft said: Unfortunately not - the DDR clocks are set in u-boot... However users could feel free to build their own u-boots that use higher DDR clocks. In any event, the clock diffs from "stable" to "unstable" - performance overall isn't enough to justify the risks, unless one looks at specific benchmarks... I'm not into benchmarking - I've got an interest in operational usage of the device. just my thoughts... sfx 1 Link to comment Share on other sites More sharing options...
lanefu Posted March 1, 2020 Share Posted March 1, 2020 6 hours ago, 5kft said: Great! It'd be interesting for you to try "openssl speed -multi 4" on your Opi Prime, and see if it makes it all the way through the run successfully without crashing (make sure you have sufficient cooling, and patience ). Well.... that seemed to take it down pretty quick 1 Link to comment Share on other sites More sharing options...
sfx2000 Posted March 2, 2020 Author Share Posted March 2, 2020 6 minutes ago, lanefu said: Well.... that seemed to take it down pretty quick Yep... Link to comment Share on other sites More sharing options...
5kft Posted March 2, 2020 Share Posted March 2, 2020 5 minutes ago, lanefu said: Well.... that seemed to take it down pretty quick OK well that answers that... I think it's clear that the memory tests used back in 2017 weren't sufficient to determine the stability of this clock. Why don't I go ahead and set it to 504MHz as that's the FA default, then if desired people could look at overclocking this further. 1 Link to comment Share on other sites More sharing options...
sfx2000 Posted March 2, 2020 Author Share Posted March 2, 2020 3 minutes ago, 5kft said: OK well that answers that... I think it's clear that the memory tests used back in 2017 weren't sufficient to determine the stability of this clock. Why don't I go ahead and set it to 504MHz as that's the FA default, then if desired people could look at overclocking this further. Sounds good - and folks should test around the 504MHz DDR clock, along with the overlay to upclock on boards that support the 1.3V regulator... both at 1.2 and 1.3 GHz. 1 Link to comment Share on other sites More sharing options...
5kft Posted March 2, 2020 Share Posted March 2, 2020 Changes checked in: https://github.com/armbian/build/commit/42201fd3fc1386c6dc8785c4f85db35289bfe2db After building a new u-boot, you can copy it to your board, then install it to the filesystem via "dpkg -i ...": root@nanopineo2:~/tmp# dpkg -i linux-u-boot-current-nanopineo2_20.05.0-trunk_arm64.deb (Reading database ... 33567 files and directories currently installed.) Preparing to unpack linux-u-boot-current-nanopineo2_20.05.0-trunk_arm64.deb ... Unpacking linux-u-boot-nanopineo2-current (20.05.0-trunk) over (20.05.0-trunk) ... Setting up linux-u-boot-nanopineo2-current (20.05.0-trunk) ... root@nanopineo2:~/tmp# Then, to install it to your SD/eMMC, run "armbian-config". In the menu, select "System and security settings", then "Install to/update boot loader", then "Install/Update the bootloader on SD/eMMC", then "Yes" at the WARNING prompt. Exit armbian-config, then reboot. Link to comment Share on other sites More sharing options...
lanefu Posted March 2, 2020 Share Posted March 2, 2020 3 hours ago, 5kft said: Then, to install it to your SD/eMMC, run "armbian-config". In the menu, select "System and security settings", then "Install to/update boot loader", then "Install/Update the bootloader on SD/eMMC", then "Yes" at the WARNING prompt. Exit armbian-config, then reboot. i built uboot and installed package. installed latest armbian-config nightly. i cant find install update bootloader in armbianconfig. did you mean nand-sata-inatall? Link to comment Share on other sites More sharing options...
lanefu Posted March 2, 2020 Share Posted March 2, 2020 Well it appears to have survived the test this time. Nice work. Link to comment Share on other sites More sharing options...
sfx2000 Posted March 9, 2020 Author Share Posted March 9, 2020 My board - still hangs up hard with the 1.2GHz overlay on the openssl stress test... verbosity=1 console=both overlay_prefix=sun50i-h5 overlays=usbhost1 usbhost2 gpio-regulator-1.3v i2c0 cpu-clock-1.2GHz-1.3v rootdev=UUID=c87503e2-838a-42db-8208-a5293ae03ad5 rootfstype=ext4 extraargs=net.ifnames=0 usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u Getting better overall performance though with the lower clocks without the overlay... Screenshots with the overlay below... hard crash on the 1.2GHz overlay,,, sun50i-h5-cpu-clock-1.2GHz-1.3v.dtbolinux-u-boot-current-nanopineo2_20.05.0-trunk_arm64.deb Link to comment Share on other sites More sharing options...
Recommended Posts