linuxjosef Posted January 11, 2020 Posted January 11, 2020 Armbianmonitor: http://ix.io/27ae7 I recently updated my Orange Pi Zero Plus to the 5.3 Kernel and since then it failed a few times. I tried to find the reason why it hangs, and when connected using ssh, I get the message kernel:[ 275.633561] thermal thermal_zone0: critical temperature reached (100 C), shutting down After seeing this, I started a program that generates load and watched the temperature and CPU in htop. Temperature goes up to 100°, but the CPU stays at 1.3Ghz. After downgrading to kernel 4.19.63 it works as expected again. Temperature raises, but then the CPU speed is limited, first to 1.01Ghz, then 960MHz and then temperature is always low enough to stay at 960MHz. Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this.
Igor Posted January 12, 2020 Posted January 12, 2020 Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are 20 hours ago, linuxjosef said: Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this. at the production level yet. But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y That's it.
linuxjosef Posted January 14, 2020 Author Posted January 14, 2020 On 1/12/2020 at 5:41 PM, Igor said: Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are at the production level yet. But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y That's it. Then I'll just wait. Is there some place where information like this can be found? So which kernels are good to use for which device?
Igor Posted January 14, 2020 Posted January 14, 2020 33 minutes ago, linuxjosef said: Is there some place where information like this can be found? Yes, usually here on the forum.
Igor Posted January 16, 2020 Posted January 16, 2020 On 1/14/2020 at 9:31 PM, linuxjosef said: Is there some place where information like this can be found? Also here: https://armbian.atlassian.net/ and here: https://github.com/armbian/build
linuxjosef Posted May 12, 2020 Author Posted May 12, 2020 OK, I tried the same again and it still isn't working in combination with the gpio-regulator-1.3v cpu-clock-1.3GHz-1.3v overlays. I have soldered the MOSFET that is missing from the board, and enabled both overlays. I tried with the following Kernel: 5.4.28-sunxi64 #20.02.7 SMP Sat Mar 28 17:25:10 CET 2020 The issue is the same. The thermal throttling never goes below 1.10GHz and that is not enough. With full load, the OrangePi runs at 1.30GHz, until it gets too hot. Then it is throttled, but never below 1.10GHz (the lowest it goes with 1.3V). For full load without fan this is still too much, and soon it shuts down. Without the 1.3v overlays enabled, it's not overheating but it never reaches the 1.1 or 1.3GHz. With the old Kernel 4.19.63-sunxi64 #5.92 SMP Fri Aug 2 00:18:27 CEST 2019 everything works as it should. The throttling goes below 1.10GHz and the CPU doesn't overheat, even with sustained full load and no fan. I can't install kernel 5.6.8 or 5.6.5. I always get the following error: The file /tmp/switch_kernel.log contains for example: linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev
Igor Posted May 12, 2020 Posted May 12, 2020 33 minutes ago, linuxjosef said: linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev Probably u-boot is missing, try: apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 1
linuxjosef Posted May 12, 2020 Author Posted May 12, 2020 20 minutes ago, Igor said: Probably u-boot is missing, try: apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 Thanks. Installing the Kernel works. Result is the same, throttling only goes down to 1.10GHz and soon: kernel:[ 132.835701] thermal thermal_zone0: critical temperature reached (100 C), shutting down
guidol Posted May 12, 2020 Posted May 12, 2020 @linuxjosef did you try to set the cpu governor to "conservative"? 1
linuxjosef Posted May 12, 2020 Author Posted May 12, 2020 1 minute ago, guidol said: @linuxjosef did you try to set the cpu governor to "conservative"? No, I only tried ondemand. If you think this might help, I'll try it. 1
guidol Posted May 12, 2020 Posted May 12, 2020 7 minutes ago, linuxjosef said: No, I only tried ondemand. If you think this might help, I'll try it. I have some board where ondemand not really throttle down, but with conservative they ever did throttle down 1
linuxjosef Posted May 13, 2020 Author Posted May 13, 2020 Unfortunately with conservative the result is exactly the same. It only throttles down to 1.10GHz and then reaches 100°C I also tried userspace, this seems to be not implemented. Frequency stays constant. Powersave as expected keeps always at lowest frequency, no overheating but also no performance. performance always keeps at highest frequency and also overheats. schedutil is similar to ondemand and conservative, also only throttles down to 1.10GHz and overheats. So looks like it is a bug in the kernel and there is no configuration to fix it.
Hammy Posted May 15, 2020 Posted May 15, 2020 I'm seeing the exact same behavior on a Nanopi Neo 2. It won’t throttle below 1.1 Ghz on the 5.4 and 5.6 Kernels.
Igor Posted May 15, 2020 Posted May 15, 2020 On 5/13/2020 at 2:01 PM, linuxjosef said: So looks like it is a bug in the kernel and there is no configuration to fix it. Possible in the thermal throttling configuration, yep. I also noticed few devices does not sustain stress tests in automated testings ... but I don't have serial consoles yet there so I don't know what actually happened. I suspect the same ... 1
sfx2000 Posted June 21, 2020 Posted June 21, 2020 @5kft and I did a fair amount of testing for the recent uboot update for NanoPI Neo2 (H5) with the following stress test... openssl speed -multi 4 WIth the schedutil governor. and keeping NanoPi Neo to 1008, it's stable On 5/13/2020 at 5:01 AM, linuxjosef said: So looks like it is a bug in the kernel and there is no configuration to fix it. No, it's a hardware issue with overclocking on both CPU and DRAM timing - it was on FriendlyArm NanoPI NEO2... OrangePi Zero Plus - same chipset and NanoPi NEO2, and perhaps not the same level of quality... Here's NanoPi NEO2 on H5 throttling under load... 18:35:37: 1008MHz 4.00 100% 0% 99% 0% 0% 0% 69.7°C 0/4 Time CPU load %cpu %sys %usr %nice %io %irq CPU C.St. 18:35:42: 960MHz 4.00 100% 0% 99% 0% 0% 0% 75.2°C 1/4 18:35:47: 960MHz 4.15 100% 0% 99% 0% 0% 0% 73.6°C 1/4 18:35:52: 816MHz 4.13 100% 0% 99% 0% 0% 0% 75.0°C 2/4 18:35:58: 816MHz 4.12 100% 0% 99% 0% 0% 0% 67.8°C 1/4 18:36:03: 960MHz 4.11 100% 0% 99% 0% 0% 0% 74.5°C 1/4 18:36:08: 960MHz 4.10 100% 0% 99% 0% 0% 0% 74.0°C 1/4 18:36:13: 816MHz 4.10 100% 0% 99% 0% 0% 0% 75.3°C 2/4 18:36:18: 960MHz 4.09 100% 0% 99% 0% 0% 0% 67.6°C 1/4 18:36:23: 960MHz 4.08 100% 0% 99% 0% 0% 0% 74.3°C 1/4 18:36:28: 960MHz 4.07 100% 0% 99% 0% 0% 0% 67.3°C 1/4 18:36:33: 960MHz 4.07 100% 0% 99% 0% 0% 0% 73.9°C 1/4 18:36:38: 960MHz 4.06 100% 0% 99% 0% 0% 0% 74.7°C 1/4 18:36:43: 960MHz 4.06 100% 0% 99% 0% 0% 0% 73.6°C 1/4 18:36:49: 960MHz 4.05 100% 0% 99% 0% 0% 0% 75.8°C 2/4 18:36:54: 1008MHz 4.05 100% 0% 99% 0% 0% 0% 73.4°C 0/4 Time CPU load %cpu %sys %usr %nice %io %irq CPU C.St. 18:36:59: 960MHz 4.04 100% 0% 99% 0% 0% 0% 75.0°C 1/4 18:37:04: 960MHz 4.04 100% 0% 99% 0% 0% 0% 74.0°C 1/4 18:37:09: 960MHz 4.04 100% 0% 99% 0% 0% 0% 74.6°C 1/4 18:37:14: 960MHz 4.03 100% 0% 99% 0% 0% 0% 70.6°C 1/4 18:37:19: 816MHz 4.03 100% 0% 99% 0% 0% 0% 75.3°C 2/4 18:37:24: 816MHz 4.03 100% 0% 99% 0% 0% 0% 76.2°C 2/4 18:37:29: 960MHz 4.02 100% 0% 99% 0% 0% 0% 67.6°C 1/4
5kft Posted June 21, 2020 Posted June 21, 2020 To bring this thread up to date - somewhere along the line the DTs dropped the correct trip list and cooling maps...I checked in fixes for this for the H3, H5, and H6 about a month ago (see https://armbian.atlassian.net/browse/AR-244 and https://armbian.atlassian.net/browse/AR-285). All seems to work well now 1
linuxjosef Posted August 30, 2020 Author Posted August 30, 2020 I tried now with kernel 5.7.15-sunxi64 . Frequency goes down to 816MHz but it seems the termal throttling is still not working sufficiently... During "armbianmonitor -z" this happens: Message from syslogd@localhost at Aug 30 11:16:24 ...ngepizeroplus" 11:16 30-Aug-20 kernel:[ 324.712300] thermal thermal_zone0: critical temperature reached (105 C), shutting down Message from syslogd@localhost at Aug 30 11:16:30 ... kernel:[ 330.708831] thermal thermal_zone0: critical temperature reached (106 C), shutting down Message from syslogd@localhost at Aug 30 11:16:31 ... kernel:[ 331.708196] thermal thermal_zone0: critical temperature reached (106 C), shutting down Message from syslogd@localhost at Aug 30 11:16:32 ... kernel:[ 332.707635] thermal thermal_zone0: critical temperature reached (105 C), shutting down Message from syslogd@localhost at Aug 30 11:16:34 ... kernel:[ 334.706513] thermal thermal_zone0: critical temperature reached (105 C), shutting down Message from syslogd@localhost at Aug 30 11:16:36 ... kernel:[ 335.705925] thermal thermal_zone0: critical temperature reached (105 C), shutting down Message from syslogd@localhost at Aug 30 11:16:39 ... kernel:[ 339.703622] thermal thermal_zone0: critical temperature reached (106 C), shutting down Message from syslogd@localhost at Aug 30 11:16:40 ... kernel:[ 341.202787] thermal thermal_zone0: critical temperature reached (107 C), shutting down Message from syslogd@localhost at Aug 30 11:16:43 ... kernel:[ 342.701911] thermal thermal_zone0: critical temperature reached (105 C), shutting down Message from syslogd@localhost at Aug 30 11:16:46 ... kernel:[ 346.699592] thermal thermal_zone0: critical temperature reached (106 C), shutting down Message from syslogd@localhost at Aug 30 11:16:48 ... kernel:[ 348.698473] thermal thermal_zone0: critical temperature reached (106 C), shutting down
Werner Posted August 30, 2020 Posted August 30, 2020 That is quite a temperature. Did you try to add a heatsink (which is recommended for any SBC that is running a load anyways)?
linuxjosef Posted August 31, 2020 Author Posted August 31, 2020 It has heatsinks on cpu and ram but no fan. Usually there is not much load. With kernel 4.19.63 it is working perfect, after some time with load only with slower speed but never over 90°C With all newer kernels it overheats. Constant 960MHz or even 816MHz under heavy load would be fine for me. Overheating an crashing is not.
Werner Posted August 31, 2020 Posted August 31, 2020 Oddly high temperature anyways. @5kft you have some H5 boards?
5kft Posted August 31, 2020 Posted August 31, 2020 @linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another? It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.
Tido Posted August 31, 2020 Posted August 31, 2020 1 hour ago, 5kft said: if throttling is indeed working is it possible read the DVFS Setting of his device? Could this be compared to his 4.19 kernel?
5kft Posted August 31, 2020 Posted August 31, 2020 39 minutes ago, Tido said: is it possible read the DVFS Setting of his device? Could this be compared to his 4.19 kernel? He could compare the contents of "/sys/class/thermal/thermal_zone0/" on the two kernels, as well as the output of "cpufreq-info" across both.
usual user Posted August 31, 2020 Posted August 31, 2020 A tmon log would be interesting to see how the thermal system performs.
linuxjosef Posted September 2, 2020 Author Posted September 2, 2020 On 8/31/2020 at 3:36 PM, 5kft said: @linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another? It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board. I did and throttling in principle works. It goes down to 816MHz, so definitely below the 1.3v level. It feels like it's going back up way too fast. e.g. the temp is above 95°C, frequency goes down to 816MHz, then the temp is down to 90°C and frequency jumps up to 1.3GHz. Maybe this, combined with the fact that the temperature readings aren't instantaneous is the problem? I am not at my parents place where the device is anymore, so I am not too eager to try anything that might crash it, because I can't reset it easily.
usual user Posted September 2, 2020 Posted September 2, 2020 3 hours ago, linuxjosef said: throttling in principle works Is the throttling triggered by thermal reasons or only by cpu load? A tmon log will tell you if it is really thermal throttling.
linuxjosef Posted September 5, 2020 Author Posted September 5, 2020 On 9/2/2020 at 8:59 PM, usual user said: Is the throttling triggered by thermal reasons or only by cpu load? A tmon log will tell you if it is really thermal throttling. Definitely thermal, the armbianmonitor benchmark was running the whole time.
usual user Posted September 5, 2020 Posted September 5, 2020 12 minutes ago, linuxjosef said: Definitely thermal, the armbianmonitor benchmark was running the whole time. AFAIK armbianmonitor does not log the cooling devices, hence my request for the tmon log.
sfx2000 Posted September 14, 2020 Posted September 14, 2020 On 8/31/2020 at 6:36 AM, 5kft said: could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another? It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board. Kind of like the stability testing we did for H5 and memory timing... openssl speed -multi 4 Puts a fair load on to the CPU...
linuxjosef Posted October 24, 2020 Author Posted October 24, 2020 Am 31.8.2020 um 15:36 schrieb 5kft: @linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another? It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board. I made some tests with the 5.9.1 dev kernel: Here without the overlays: https://asciinema.org/a/Yf8CSjl3bJFFNQmEPuMEZVTum here with "gpio-regulator-1.3v cpu-clock-1.3GHz-1.3v" enabled: https://asciinema.org/a/jKNg8qBlOwAi5v3vqMQNH4IOF As you can see, still not working correctly.
Recommended Posts