Pedro Lamas Posted August 30, 2020 Posted August 30, 2020 My NanoPi M4V2 is currently being used completely headless just to run a few Docker containers (AdGuard Home, Home Assistant, Zigbee2Mqtt, etc.) and I've been noticing more and more crashes happening... sometimes they occur while I have an open SSH terminal, and I get a glimpse of a "kernel panic" type error... Here's the output from armbianconfig: http://ix.io/2vFX Any ideas on what this is or how to fix it? Thanks in advance!
Igor Posted August 31, 2020 Posted August 31, 2020 On 8/30/2020 at 11:27 AM, Pedro Lamas said: and I get a glimpse of a "kernel panic" type error... Screenshot of the kernel panic on a console might help - our default logs doesn't show anything suspicious. I hope your powering is proper quality?
Werner Posted August 31, 2020 Posted August 31, 2020 35 minutes ago, Igor said: I hope your powering is proper quality? To get more into detail please have a look at our documentation to avoid common mistakes. https://docs.armbian.com/
Pedro Lamas Posted August 31, 2020 Author Posted August 31, 2020 8 hours ago, Igor said: I hope your powering is proper quality? I'm powering it with a 5V 4A PoE adapter. I'm using a 256GB Sabrent M.2 NVME SSD as the main disk (configured via armbian-config), so the Samsung SD card is for initial booting only. I'll try to get a screenshot of the error and post it here the next time I get that error.
Igor Posted August 31, 2020 Posted August 31, 2020 7 minutes ago, Pedro Lamas said: I'm powering it with a Can you temporally use something else? Until proven otherwise there is a source of the problem.
Pedro Lamas Posted August 31, 2020 Author Posted August 31, 2020 1 minute ago, Igor said: Can you temporally use something else? Yes, I have a standard 5V 4A Power Adapter charger that came with the board, unfortunately that too causes the same issue...
Igor Posted August 31, 2020 Posted August 31, 2020 OK, then its deeper and not so simple to fix ... we need to catch it when it crash for start.
Pedro Lamas Posted September 1, 2020 Author Posted September 1, 2020 I looked at the logs in /var/log/kern.log and found this: https://pastebin.com/aTaSaPFM I clearly remember the error shown when it crashed to mention "PREEMPT SMP" which is shown above! If there's any specific log I can provide to help with this please do tell me and I will upload.
Pedro Lamas Posted September 8, 2020 Author Posted September 8, 2020 Just updated to the latest firmware, will keep you posted if I still experience the same issue! pedro@nanopim4v2:~$ uname -a Linux nanopim4v2 5.8.6-rockchip64 #20.08.2 SMP PREEMPT Fri Sep 4 20:23:22 CEST 2020 aarch64 GNU/Linux
Pedro Lamas Posted September 14, 2020 Author Posted September 14, 2020 I'm still getting these daily, not sure what else can I do to fix it... any ideas or thoughts would be quite welcome!
piter75 Posted September 14, 2020 Posted September 14, 2020 2 hours ago, Pedro Lamas said: not sure what else can I do to fix it At this point I do not recommend running M4v2 with mainline linux. The reason for this instabilities and the remedy are yet to be discovered The board runs stable with legacy though.
Pedro Lamas Posted September 19, 2020 Author Posted September 19, 2020 Just got yet another of those errors while using docker: pedro@nanopim4v2:~/docker$ docker-compose down Stopping docker_mariadb_1 ... Stopping docker_homeassistant_1 ... Stopping docker_nginx_1 ... Stopping docker_zigbee2mqtt_1 ... done Stopping docker_esphome_1 ... Stopping docker_telegraf_1 ... done Stopping docker_acme.sh_1 ... done Stopping docker_mosquitto_1 ... Stopping docker_portainer_1 ... Stopping docker_vscode_1 ... Stopping docker_grafana_1 ... Stopping docker_adguardhome_1 ... Message from syslogd@localhost at Sep 19 16:10:14 ... kernel:[108790.564123] Kernel panic - not syncing: bad mode System stalled after that and I had to manually restart it.
Pedro Lamas Posted September 19, 2020 Author Posted September 19, 2020 Another one just now while I had an SSH session open: Message from syslogd@localhost at Sep 19 16:57:04 ... kernel:[ 2638.817744] Internal error: Oops: 96000005 [#1] PREEMPT SMP
Pedro Lamas Posted September 19, 2020 Author Posted September 19, 2020 The device gets completely unresponsive after this, is there a way to as least make it automatically reboot on failure instead of waiting for me to do it manually?
Pedro Lamas Posted September 20, 2020 Author Posted September 20, 2020 A few more errors this morning: Message from syslogd@localhost at Sep 20 10:50:09 ... kernel:[59823.225830] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Message from syslogd@localhost at Sep 20 10:50:09 ... kernel:[59823.252559] Code: f9401bf7 17ffff7d a9025bf5 f9001bf7 (d4210000)
Nobby42 Posted September 26, 2020 Posted September 26, 2020 Hello, I found a problem with the CPU GOVERNOR. ENABLE=true MIN_SPEED=600000 #MAX_SPEED=2016000 MAX_SPEED=1800000 # performance i firefox preferences i 6 # GOVERNOR=conservative # GOVERNOR=ondemand GOVERNOR=performance When I set the GOVERNOR set to "performance" it looks like that the NANOPI run stable. In MODE "ondemand" and "conservative" the PI crashes after a few hours. I do not have enough experience to play with the GOVERNOR parameters to get a stable setting in USER mode.
Werner Posted September 26, 2020 Posted September 26, 2020 4 minutes ago, Nobby42 said: When I set the GOVERNOR set to "performance" it looks like that the NANOPI run stable. Interesting. So there might be an issue with clock changes (DVFS)?
Pedro Lamas Posted September 26, 2020 Author Posted September 26, 2020 Not sure what the governor is or how I can change it (granted, haven't looked into the docs yet) but if there's any setting I can change to test I'll be willing to do just that! At the moment, the best I've seen my nanopi m4v2 work without crashing was about 3 days, but on average I have to reboot it daily...
Pedro Lamas Posted September 27, 2020 Author Posted September 27, 2020 On 9/26/2020 at 12:09 PM, Werner said: Interesting. So there might be an issue with clock changes (DVFS)? Not sure how relevant this might be, but after checking around on previous reports on issues with the governor on the nanopi m4v2, I found this post and it actually mentions that "on demand" is causing issues... looking further down, I see bug reports that are quite similar to my own experience. My system is currently set to "on demand" (never changed it, so I assume this is the default), I might give it a try and set it to "performance" or some other values and test with that.
Werner Posted September 28, 2020 Posted September 28, 2020 You could also try to set userspace as govenor and put the min an max frequency at the same values. performance basically means to set them to the highest possible.
Pedro Lamas Posted September 29, 2020 Author Posted September 29, 2020 On 9/28/2020 at 4:44 AM, Werner said: You could also try to set userspace as govenor and put the min an max frequency at the same values. performance basically means to set them to the highest possible. I've changed it to 1008000 yesterday, so far all good (though obviously a bit slower): pedro@nanopim4v2:~$ cat /etc/default/cpufrequtils ENABLE=true MIN_SPEED=1008000 MAX_SPEED=1008000 GOVERNOR=userspace 1
hexdump Posted September 29, 2020 Posted September 29, 2020 maybe that helps, but running this board at only 1 ghz sounds a bit strange
Werner Posted September 29, 2020 Posted September 29, 2020 3 minutes ago, hexdump said: maybe that helps, but running this board at only 1 ghz sounds a bit strange I think this is more a test if it runs stable on various frequencies if they are kept constant
piter75 Posted September 29, 2020 Posted September 29, 2020 7 hours ago, Pedro Lamas said: I've changed it to 1008000 yesterday, so far all good (though obviously a bit slower): With one of my boards I have had good results with min set to 1008000 and max to 2016000 (ondemand governor). You could also try that range. However the other one is still unstable in this scenario but runs stable with performance governor (meaning 2016000 all the time).
Pedro Lamas Posted September 30, 2020 Author Posted September 30, 2020 15 hours ago, hexdump said: maybe that helps, but running this board at only 1 ghz sounds a bit strange Indeed, but I'm just doing this as a test and seeing how things go... I want to take full advantage of this boards, but something must be going terribly wrong for it to randomly crash and maybe this is a step in the right direction to find out what it is and how to mitigate it! 10 hours ago, piter75 said: With one of my boards I have had good results with min set to 1008000 and max to 2016000 (ondemand governor). You could also try that range. However the other one is still unstable in this scenario but runs stable with performance governor (meaning 2016000 all the time). Those are the settings I had originally (min 1008000, max 2016000, ondemand), unfortunately I know those make my board to randomly crash!
Pedro Lamas Posted September 30, 2020 Author Posted September 30, 2020 @Werner just a thought but the NanoPi M4V2 has an RK3399 SoC , so it has a Dual-Core Cortex-A72(up to 2.0GHz) and a Quad-Core Cortex-A53(up to 1.5GHz)... does the "ondemand" governor handle these two CPU's separately? The datasheet does mention they work with different voltages: http://opensource.rock-chips.com/images/d/d7/Rockchip_RK3399_Datasheet_V2.1-20200323.pdf
Werner Posted September 30, 2020 Posted September 30, 2020 13 minutes ago, Pedro Lamas said: @Werner just a thought but the NanoPi M4V2 has an RK3399 SoC , so it has a Dual-Core Cortex-A72(up to 2.0GHz) and a Quad-Core Cortex-A53(up to 1.5GHz)... does the "ondemand" governor handle these two CPU's separately? The datasheet does mention they work with different voltages: http://opensource.rock-chips.com/images/d/d7/Rockchip_RK3399_Datasheet_V2.1-20200323.pdf No idea. Sorry
piter75 Posted September 30, 2020 Posted September 30, 2020 23 minutes ago, Pedro Lamas said: does the "ondemand" governor handle these two CPU's separately? Yes. They are treated as separate groups/clusters when it comes to scaling and they also have separate regulators assigned to them. cpufrequtils however cannot configure their limits separately - the same limits are applied to both clusters 1
Pedro Lamas Posted September 30, 2020 Author Posted September 30, 2020 I just noticed something while reading at the RK3399 specsheet: the recommended maximum frequency of the A72 is actually 1.8Ghz, not 2.0Ghz as on the FriendlyARM website and wiki! The RK3399K however does indicate a recommended maximum of 2.0Ghz, but that is not the version in use on the NanoPi M4V2. The Rock Pi 4 uses the same RK3399 SoC and they specifically say the frequency of the A72 is 1.8Ghz. I even found a commit in armbian codebase for the Helios64 (another one with the same RK3399 SoC) where the maximum is set to 1.8Ghz: https://github.com/armbian/build/pull/2191 I will leave my board for a couple of days more with "userspace" governor and min and max set to 1008000, and if there's no crashes, I will try "ondemand" governor with min set to 1008000 and max to 1800000
aprayoga Posted October 1, 2020 Posted October 1, 2020 @piter75 @Pedro Lamas Helios64 also encounter some random crash, yesterday we tried to redefine opp just 408 MHz and 1.4/1.8 GHz and we don't see any random crash anymore. It seems similar DVFS problem as discussed in this thread. Then our customer point us to odroid n1 issue at https://forum.odroid.com/viewtopic.php?t=30303 Maybe you can give it a try on Nano Pi M4v2. We are still testing on Helios64 (with value 40000), so far with reboot and power cycle does not trigger any kernel crash. 2
Recommended Posts