slimcomp Posted April 7, 2023 Posted April 7, 2023 (edited) After booting up armbian 23.02.2 on my orange pi 4 LTS, cpu temp goes up to 95 deg celcius when stress testing (no heatsink or fan installed at all) and the device reboots shortly afterwards. I ran `armbianmonitor -m` when stress testing and found out that some of the cores are not throttling. I examined the default trip point settings coming with the image in the file /boot/dtb/rockchip/rk3399-orangepi-4-lts.dtb: trips { cpu_alert0 { temperature = <0x14c08>; hysteresis = <0x7d0>; type = "passive"; phandle = <0x5a>; }; cpu_alert1 { temperature = <0x17318>; hysteresis = <0x7d0>; type = "passive"; phandle = <0x5b>; }; cpu_crit { temperature = <0x186a0>; hysteresis = <0x7d0>; type = "critical"; phandle = <0xf3>; }; }; These temperature settings do not match the vendor images' settings, and I changed them to match the vendor images' settings as follows: trips { cpu_alert0 { temperature = <0x11170>; hysteresis = <0x7d0>; type = "passive"; phandle = <0x5a>; }; cpu_alert1 { temperature = <0x14c08>; hysteresis = <0x7d0>; type = "passive"; phandle = <0x5b>; }; cpu_crit { temperature = <0x1c138>; hysteresis = <0x7d0>; type = "critical"; phandle = <0xf3>; }; }; After making this change, I no longer have any reboots caused by overheating and `armbianmonitor -m` shows that the cpu is correctly throttled at 85 deg celcius when stressing testing without heatsink or fan. Is this the right way to solve this issue? If so, can someone make a PR for this? Thanks. Edited April 7, 2023 by slimcomp 0 Quote
jock Posted July 31, 2023 Posted July 31, 2023 Thanks for noticing this, and sorry for being very late on answer. I will check and fix the trip points soon; also there could be room for improvement from stock/vendor based trip points with the granularity provided by devfreq framework for GPU and DMC (read: GPU and DDR can be controlled as well to lower temperature of the board). 0 Quote
jock Posted August 4, 2023 Posted August 4, 2023 I took a look and the trip points table is provided by the mainline kernel. The trip points on mainline are: 85 °C - big cores are thermal throttled (even down to 600 Mhz from 1.8Ghz ) 95 °C - all cores are thermal throttled 100 °C - device reboots On vendor kernel instead: 70 °C - big cores are thermal throttled 85 °C - all cores are thermal throttled 115 °C - device reboots As far as I see, the mainline trip points looks for reasonable for me, and the device reboots at 100 °C which looks like a reasonable critical temperature to prevent physical damage on the long term. If you reach 100 °C and beyond, you definitely need a heatsink and proper energy dispersion. On my tests running concurrently openssl speed -multi 6 and mbw -n 1000 256 , stressing the CPU with crypto tests and DRAM for benchmarks, the board never crosses 86 °C because the thermal throttling of the big cores gets engaged and it looks sufficient to keep the soc at a reasonable temperature even after several minutes of sustained load. Of course my board is without any kind of enclosure. I don't think there is the need to really change the trip points. On your setup perhaps you don't get reboots because you're allowed to stress to soc up to 115 °C, but you should evaluate a way to remove the heat in excess rather than raise the limits, or limit the core frequency to reduce energy dissipation if you're in a constrained environment. edit: I should retreat partially: it seems that the 95°C trip point for all cores is way too high. My board hanged at 94°C during the rsa test with openssl, so quite probably it is better to change the trip points this way: 82 °C - big cores are thermal throttled 85 °C - all cores and DMC are thermal throttled 90 °C - device reboots this should also give enough room for the board to recover after reboot 0 Quote
jock Posted August 4, 2023 Posted August 4, 2023 This is the offending patch, that potentially affects all RK3399 boards 0 Quote
mimosius Posted September 16 Posted September 16 (edited) I finally found some time to test this device I bought 2 years ago. 😁 At 82°C it hangs and the green led indicates a heart attack while compiling with all 6 cores. The device is passive cooled within the original black aluminum case. Edited September 16 by mimosius 0 Quote
jock Posted September 16 Posted September 16 Mmmh 82°C seems rather low threshold to me. If you repeat the compilation and it hangs at the same temperature? Lowering the threshold to 82°C means that it should start throttling big cores around 75°C, throttle the rest at 78°C and reboot at 80/81°C 0 Quote
mimosius Posted September 16 Posted September 16 I already tried it several times before posting. It always happened at 82°C. After lowering the thresholds to your suggested values, it didn't happen anymore. 0 Quote
mimosius Posted September 17 Posted September 17 (edited) OK, forget all what I said about temperature - now it hanged at 62"C (active cooling) but always at the same compilation progress. Sep 17 07:38:41 opi4 kernel: oom_kill_process+0x130/0x2d8 Sep 17 07:38:41 opi4 kernel: out_of_memory+0xe0/0x59c Sep 17 07:38:41 opi4 kernel: __alloc_pages+0xc40/0xe48 The default trip points are working well. Sorry for disturbing 😗 Edited September 17 by mimosius 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.