Jump to content

Thermal Throtteling not working in 5.3 and 5.4 kernel on Orange Pi Zero Plus


linuxjosef

Recommended Posts

Armbianmonitor:

I recently updated my Orange Pi Zero Plus to the 5.3 Kernel and since then it failed a few times. I tried to find the reason why it hangs, and when connected using ssh, I get the message 


 

 kernel:[  275.633561] thermal thermal_zone0: critical temperature reached (100 C), shutting down

 

After seeing this, I started a program that generates load and watched the temperature and CPU in htop. Temperature goes up to 100°, but the CPU stays at 1.3Ghz.

After downgrading to kernel 4.19.63 it works as expected again. Temperature raises, but then the CPU speed is limited, first to 1.01Ghz,  then 960MHz and then temperature is always low enough to stay at 960MHz.

 

Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this.

 

 

Link to comment
Share on other sites

Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are

 

20 hours ago, linuxjosef said:

Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this.

 

at the production level yet.

But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y

That's it.

Link to comment
Share on other sites

On 1/12/2020 at 5:41 PM, Igor said:

Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are

 

 

at the production level yet.

But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y

That's it.

Then I'll just wait. Is there some place where information like this can be found? So which kernels are good to use for which device?

Link to comment
Share on other sites

OK, I tried the same again and it still isn't working in combination with the gpio-regulator-1.3v cpu-clock-1.3GHz-1.3v overlays.

I have soldered the MOSFET that is missing from the board, and enabled both overlays.

 

I tried with the following Kernel:

 

5.4.28-sunxi64 #20.02.7 SMP Sat Mar 28 17:25:10 CET 2020

 

 

The issue is the same.

The thermal throttling never goes below 1.10GHz and that is not enough.

With full load, the OrangePi runs at 1.30GHz, until it gets too hot. Then it is throttled, but never below 1.10GHz (the lowest it goes with 1.3V).

For full load without fan this is still too much, and soon it shuts down.

 

Without the 1.3v overlays enabled, it's not overheating but it never reaches the 1.1 or 1.3GHz.

 

With the old Kernel

4.19.63-sunxi64 #5.92 SMP Fri Aug 2 00:18:27 CEST 2019

everything works as it should. The throttling goes below 1.10GHz and the CPU doesn't overheat, even with sustained full load and no fan.

 

 

 

I can't install kernel 5.6.8 or 5.6.5. I always get the following error:

image.png.ac76541ddaab3b175dba75ab54e3628e.png

 

The file /tmp/switch_kernel.log contains for example:

linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev

 

Link to comment
Share on other sites

33 minutes ago, linuxjosef said:

linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev


Probably u-boot is missing, try:

apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130

 

Link to comment
Share on other sites

20 minutes ago, Igor said:


Probably u-boot is missing, try:


apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130

 

Thanks.

 

Installing the Kernel works.

 

Result is the same, throttling only goes down to 1.10GHz and soon:

kernel:[  132.835701] thermal thermal_zone0: critical temperature reached (100 C), shutting down

Link to comment
Share on other sites

Unfortunately with conservative the result is exactly the same. It only throttles down to 1.10GHz and then reaches 100°C

 

I also tried userspace, this seems to be not implemented. Frequency stays constant.

 

Powersave as expected keeps always at lowest frequency, no overheating but also no performance.

 

performance always keeps at highest frequency and also overheats.

 

schedutil is similar to ondemand and conservative, also only throttles down to 1.10GHz and overheats.

 

So looks like it is a bug in the kernel and there is no configuration to fix it.

Link to comment
Share on other sites

On 5/13/2020 at 2:01 PM, linuxjosef said:

So looks like it is a bug in the kernel and there is no configuration to fix it.


Possible in the thermal throttling configuration, yep. I also noticed few devices does not sustain stress tests in automated testings ... but I don't have serial consoles yet there so I don't know what actually happened. I suspect the same ...

Link to comment
Share on other sites

@5kft and I did a fair amount of testing for the recent uboot update for NanoPI Neo2 (H5) with the following stress test...

openssl speed -multi 4

 

WIth the schedutil governor. and keeping NanoPi Neo to 1008, it's stable

 

On 5/13/2020 at 5:01 AM, linuxjosef said:

So looks like it is a bug in the kernel and there is no configuration to fix it.

 

No, it's a hardware issue with overclocking on both CPU and DRAM timing - it was on FriendlyArm NanoPI NEO2...

 

OrangePi Zero Plus - same chipset and NanoPi NEO2, and perhaps not the same level of quality...

 

Here's NanoPi NEO2 on H5 throttling under load...

 

18:35:37: 1008MHz  4.00 100%   0%  99%   0%   0%   0% 69.7°C  0/4
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
18:35:42:  960MHz  4.00 100%   0%  99%   0%   0%   0% 75.2°C  1/4
18:35:47:  960MHz  4.15 100%   0%  99%   0%   0%   0% 73.6°C  1/4
18:35:52:  816MHz  4.13 100%   0%  99%   0%   0%   0% 75.0°C  2/4
18:35:58:  816MHz  4.12 100%   0%  99%   0%   0%   0% 67.8°C  1/4
18:36:03:  960MHz  4.11 100%   0%  99%   0%   0%   0% 74.5°C  1/4
18:36:08:  960MHz  4.10 100%   0%  99%   0%   0%   0% 74.0°C  1/4
18:36:13:  816MHz  4.10 100%   0%  99%   0%   0%   0% 75.3°C  2/4
18:36:18:  960MHz  4.09 100%   0%  99%   0%   0%   0% 67.6°C  1/4
18:36:23:  960MHz  4.08 100%   0%  99%   0%   0%   0% 74.3°C  1/4
18:36:28:  960MHz  4.07 100%   0%  99%   0%   0%   0% 67.3°C  1/4
18:36:33:  960MHz  4.07 100%   0%  99%   0%   0%   0% 73.9°C  1/4
18:36:38:  960MHz  4.06 100%   0%  99%   0%   0%   0% 74.7°C  1/4
18:36:43:  960MHz  4.06 100%   0%  99%   0%   0%   0% 73.6°C  1/4
18:36:49:  960MHz  4.05 100%   0%  99%   0%   0%   0% 75.8°C  2/4
18:36:54: 1008MHz  4.05 100%   0%  99%   0%   0%   0% 73.4°C  0/4
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
18:36:59:  960MHz  4.04 100%   0%  99%   0%   0%   0% 75.0°C  1/4
18:37:04:  960MHz  4.04 100%   0%  99%   0%   0%   0% 74.0°C  1/4
18:37:09:  960MHz  4.04 100%   0%  99%   0%   0%   0% 74.6°C  1/4
18:37:14:  960MHz  4.03 100%   0%  99%   0%   0%   0% 70.6°C  1/4
18:37:19:  816MHz  4.03 100%   0%  99%   0%   0%   0% 75.3°C  2/4
18:37:24:  816MHz  4.03 100%   0%  99%   0%   0%   0% 76.2°C  2/4
18:37:29:  960MHz  4.02 100%   0%  99%   0%   0%   0% 67.6°C  1/4

 

Link to comment
Share on other sites

I tried now with kernel 5.7.15-sunxi64 . Frequency goes down to 816MHz but it seems the termal throttling is still not working sufficiently...

 

During "armbianmonitor -z" this happens:


Message from syslogd@localhost at Aug 30 11:16:24 ...ngepizeroplus" 11:16 30-Aug-20 kernel:[  324.712300] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:30 ...
 kernel:[  330.708831] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:31 ...
 kernel:[  331.708196] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:32 ...
 kernel:[  332.707635] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:34 ...
 kernel:[  334.706513] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:36 ...
 kernel:[  335.705925] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:39 ...
 kernel:[  339.703622] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:40 ...
 kernel:[  341.202787] thermal thermal_zone0: critical temperature reached (107 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:43 ...
 kernel:[  342.701911] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:46 ...
 kernel:[  346.699592] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:48 ...
 kernel:[  348.698473] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Link to comment
Share on other sites

It has heatsinks on cpu and ram but no fan.

Usually there is not much load.

With kernel 4.19.63 it is working perfect, after some time with load only with slower speed but never over 90°C

With all newer kernels it overheats.

 

Constant 960MHz or even 816MHz under heavy load would be fine for me.

Overheating an crashing is not.

Link to comment
Share on other sites

39 minutes ago, Tido said:

is it possible read the  DVFS Setting of his device?

Could this be compared to his 4.19 kernel?

 

He could compare the contents of "/sys/class/thermal/thermal_zone0/" on the two kernels, as well as the output of "cpufreq-info" across both.

Link to comment
Share on other sites

On 8/31/2020 at 3:36 PM, 5kft said:

@linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

I did and throttling in principle works. It goes down to 816MHz, so definitely below the 1.3v level. 

It feels like it's going back up way too fast. e.g. the temp is above 95°C, frequency goes down to 816MHz, then the temp is down to 90°C and frequency jumps up to 1.3GHz. Maybe this, combined with the fact that the temperature readings aren't instantaneous is the problem?

 

I am not at my parents place where the device is anymore, so I am not too eager to try anything that might crash it, because I can't reset it easily.

Link to comment
Share on other sites

On 8/31/2020 at 6:36 AM, 5kft said:

could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

 

Kind of like the stability testing we did for H5 and memory timing...

 

openssl speed -multi 4

Puts a fair load on to the CPU...

Link to comment
Share on other sites

Am 31.8.2020 um 15:36 schrieb 5kft:

@linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

I made some tests with the 5.9.1 dev kernel:

 

Here without the overlays:

https://asciinema.org/a/Yf8CSjl3bJFFNQmEPuMEZVTum

 

here with "gpio-regulator-1.3v cpu-clock-1.3GHz-1.3v" enabled:

 

https://asciinema.org/a/jKNg8qBlOwAi5v3vqMQNH4IOF

 

 

As you can see, still not working correctly.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines