0
linuxjosef

Thermal Throtteling not working in 5.3 and 5.4 kernel on Orange Pi Zero Plus

Recommended Posts

Armbianmonitor:

I recently updated my Orange Pi Zero Plus to the 5.3 Kernel and since then it failed a few times. I tried to find the reason why it hangs, and when connected using ssh, I get the message 


 

 kernel:[  275.633561] thermal thermal_zone0: critical temperature reached (100 C), shutting down

 

After seeing this, I started a program that generates load and watched the temperature and CPU in htop. Temperature goes up to 100°, but the CPU stays at 1.3Ghz.

After downgrading to kernel 4.19.63 it works as expected again. Temperature raises, but then the CPU speed is limited, first to 1.01Ghz,  then 960MHz and then temperature is always low enough to stay at 960MHz.

 

Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this.

 

 

Share this post


Link to post
Share on other sites
Want Armbian t-shirt or a cup?

Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are

 

20 hours ago, linuxjosef said:

Since my Orange Pi Zero Plus is a few hundred kilometers away in my parents house and I always have to call my dad if it hangs, I won't be able to do much testing but I hope someone else can reproduce this.

 

at the production level yet.

But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y

That's it.

Share this post


Link to post
Share on other sites
On 1/12/2020 at 5:41 PM, Igor said:

Kernel 5.4.y is the first on this board where most of the critical stuff has been fixed. Still not sure if we are

 

 

at the production level yet.

But update to 5.4.y hasn't been pushed out - you can only update to beta repository or manual ... or wait few more weeks or start with a clean image which already have kernel 5.4.y

That's it.

Then I'll just wait. Is there some place where information like this can be found? So which kernels are good to use for which device?

Share this post


Link to post
Share on other sites
33 minutes ago, linuxjosef said:

Is there some place where information like this can be found?


Yes, usually here on the forum. 

Share this post


Link to post
Share on other sites

OK, I tried the same again and it still isn't working in combination with the gpio-regulator-1.3v cpu-clock-1.3GHz-1.3v overlays.

I have soldered the MOSFET that is missing from the board, and enabled both overlays.

 

I tried with the following Kernel:

 

5.4.28-sunxi64 #20.02.7 SMP Sat Mar 28 17:25:10 CET 2020

 

 

The issue is the same.

The thermal throttling never goes below 1.10GHz and that is not enough.

With full load, the OrangePi runs at 1.30GHz, until it gets too hot. Then it is throttled, but never below 1.10GHz (the lowest it goes with 1.3V).

For full load without fan this is still too much, and soon it shuts down.

 

Without the 1.3v overlays enabled, it's not overheating but it never reaches the 1.1 or 1.3GHz.

 

With the old Kernel

4.19.63-sunxi64 #5.92 SMP Fri Aug 2 00:18:27 CEST 2019

everything works as it should. The throttling goes below 1.10GHz and the CPU doesn't overheat, even with sustained full load and no fan.

 

 

 

I can't install kernel 5.6.8 or 5.6.5. I always get the following error:

image.png.ac76541ddaab3b175dba75ab54e3628e.png

 

The file /tmp/switch_kernel.log contains for example:

linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev

 

Share this post


Link to post
Share on other sites
33 minutes ago, linuxjosef said:

linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130 linux-u-boot-orangepizeroplus-dev


Probably u-boot is missing, try:

apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130

 

Share this post


Link to post
Share on other sites
20 minutes ago, Igor said:


Probably u-boot is missing, try:


apt install linux-image-dev-sunxi64=20.05.0-trunk.130 linux-dtb-dev-sunxi64=20.05.0-trunk.130

 

Thanks.

 

Installing the Kernel works.

 

Result is the same, throttling only goes down to 1.10GHz and soon:

kernel:[  132.835701] thermal thermal_zone0: critical temperature reached (100 C), shutting down

Share this post


Link to post
Share on other sites
1 minute ago, guidol said:

@linuxjosef did you try to set the cpu governor to "conservative"?

No, I only tried ondemand. If you think this might help, I'll try it.

Share this post


Link to post
Share on other sites
7 minutes ago, linuxjosef said:

No, I only tried ondemand. If you think this might help, I'll try it.

I have some board where ondemand not really throttle down, but with conservative they ever did throttle down :)

Share this post


Link to post
Share on other sites

Unfortunately with conservative the result is exactly the same. It only throttles down to 1.10GHz and then reaches 100°C

 

I also tried userspace, this seems to be not implemented. Frequency stays constant.

 

Powersave as expected keeps always at lowest frequency, no overheating but also no performance.

 

performance always keeps at highest frequency and also overheats.

 

schedutil is similar to ondemand and conservative, also only throttles down to 1.10GHz and overheats.

 

So looks like it is a bug in the kernel and there is no configuration to fix it.

Share this post


Link to post
Share on other sites

I'm seeing the exact same behavior on a Nanopi Neo 2. It won’t throttle below 1.1 Ghz on the 5.4 and 5.6 Kernels.

Share this post


Link to post
Share on other sites
On 5/13/2020 at 2:01 PM, linuxjosef said:

So looks like it is a bug in the kernel and there is no configuration to fix it.


Possible in the thermal throttling configuration, yep. I also noticed few devices does not sustain stress tests in automated testings ... but I don't have serial consoles yet there so I don't know what actually happened. I suspect the same ...

Share this post


Link to post
Share on other sites

@5kft and I did a fair amount of testing for the recent uboot update for NanoPI Neo2 (H5) with the following stress test...

openssl speed -multi 4

 

WIth the schedutil governor. and keeping NanoPi Neo to 1008, it's stable

 

On 5/13/2020 at 5:01 AM, linuxjosef said:

So looks like it is a bug in the kernel and there is no configuration to fix it.

 

No, it's a hardware issue with overclocking on both CPU and DRAM timing - it was on FriendlyArm NanoPI NEO2...

 

OrangePi Zero Plus - same chipset and NanoPi NEO2, and perhaps not the same level of quality...

 

Here's NanoPi NEO2 on H5 throttling under load...

 

18:35:37: 1008MHz  4.00 100%   0%  99%   0%   0%   0% 69.7°C  0/4
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
18:35:42:  960MHz  4.00 100%   0%  99%   0%   0%   0% 75.2°C  1/4
18:35:47:  960MHz  4.15 100%   0%  99%   0%   0%   0% 73.6°C  1/4
18:35:52:  816MHz  4.13 100%   0%  99%   0%   0%   0% 75.0°C  2/4
18:35:58:  816MHz  4.12 100%   0%  99%   0%   0%   0% 67.8°C  1/4
18:36:03:  960MHz  4.11 100%   0%  99%   0%   0%   0% 74.5°C  1/4
18:36:08:  960MHz  4.10 100%   0%  99%   0%   0%   0% 74.0°C  1/4
18:36:13:  816MHz  4.10 100%   0%  99%   0%   0%   0% 75.3°C  2/4
18:36:18:  960MHz  4.09 100%   0%  99%   0%   0%   0% 67.6°C  1/4
18:36:23:  960MHz  4.08 100%   0%  99%   0%   0%   0% 74.3°C  1/4
18:36:28:  960MHz  4.07 100%   0%  99%   0%   0%   0% 67.3°C  1/4
18:36:33:  960MHz  4.07 100%   0%  99%   0%   0%   0% 73.9°C  1/4
18:36:38:  960MHz  4.06 100%   0%  99%   0%   0%   0% 74.7°C  1/4
18:36:43:  960MHz  4.06 100%   0%  99%   0%   0%   0% 73.6°C  1/4
18:36:49:  960MHz  4.05 100%   0%  99%   0%   0%   0% 75.8°C  2/4
18:36:54: 1008MHz  4.05 100%   0%  99%   0%   0%   0% 73.4°C  0/4
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU  C.St.
18:36:59:  960MHz  4.04 100%   0%  99%   0%   0%   0% 75.0°C  1/4
18:37:04:  960MHz  4.04 100%   0%  99%   0%   0%   0% 74.0°C  1/4
18:37:09:  960MHz  4.04 100%   0%  99%   0%   0%   0% 74.6°C  1/4
18:37:14:  960MHz  4.03 100%   0%  99%   0%   0%   0% 70.6°C  1/4
18:37:19:  816MHz  4.03 100%   0%  99%   0%   0%   0% 75.3°C  2/4
18:37:24:  816MHz  4.03 100%   0%  99%   0%   0%   0% 76.2°C  2/4
18:37:29:  960MHz  4.02 100%   0%  99%   0%   0%   0% 67.6°C  1/4

 

Share this post


Link to post
Share on other sites

I tried now with kernel 5.7.15-sunxi64 . Frequency goes down to 816MHz but it seems the termal throttling is still not working sufficiently...

 

During "armbianmonitor -z" this happens:


Message from syslogd@localhost at Aug 30 11:16:24 ...ngepizeroplus" 11:16 30-Aug-20 kernel:[  324.712300] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:30 ...
 kernel:[  330.708831] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:31 ...
 kernel:[  331.708196] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:32 ...
 kernel:[  332.707635] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:34 ...
 kernel:[  334.706513] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:36 ...
 kernel:[  335.705925] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:39 ...
 kernel:[  339.703622] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:40 ...
 kernel:[  341.202787] thermal thermal_zone0: critical temperature reached (107 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:43 ...
 kernel:[  342.701911] thermal thermal_zone0: critical temperature reached (105 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:46 ...
 kernel:[  346.699592] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Message from syslogd@localhost at Aug 30 11:16:48 ...
 kernel:[  348.698473] thermal thermal_zone0: critical temperature reached (106 C), shutting down

Share this post


Link to post
Share on other sites

That is quite a temperature. Did you try to add a heatsink (which is recommended for any SBC that is running a load anyways)?

Share this post


Link to post
Share on other sites

It has heatsinks on cpu and ram but no fan.

Usually there is not much load.

With kernel 4.19.63 it is working perfect, after some time with load only with slower speed but never over 90°C

With all newer kernels it overheats.

 

Constant 960MHz or even 816MHz under heavy load would be fine for me.

Overheating an crashing is not.

Share this post


Link to post
Share on other sites

@linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

Share this post


Link to post
Share on other sites
1 hour ago, 5kft said:

if throttling is indeed working

is it possible read the  DVFS Setting of his device?

Could this be compared to his 4.19 kernel?

Share this post


Link to post
Share on other sites
39 minutes ago, Tido said:

is it possible read the  DVFS Setting of his device?

Could this be compared to his 4.19 kernel?

 

He could compare the contents of "/sys/class/thermal/thermal_zone0/" on the two kernels, as well as the output of "cpufreq-info" across both.

Share this post


Link to post
Share on other sites
On 8/31/2020 at 3:36 PM, 5kft said:

@linuxjosef - could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

I did and throttling in principle works. It goes down to 816MHz, so definitely below the 1.3v level. 

It feels like it's going back up way too fast. e.g. the temp is above 95°C, frequency goes down to 816MHz, then the temp is down to 90°C and frequency jumps up to 1.3GHz. Maybe this, combined with the fact that the temperature readings aren't instantaneous is the problem?

 

I am not at my parents place where the device is anymore, so I am not too eager to try anything that might crash it, because I can't reset it easily.

Share this post


Link to post
Share on other sites
3 hours ago, linuxjosef said:

throttling in principle works

Is the throttling triggered by thermal reasons or only by cpu load?

A tmon log will tell you if it is really thermal throttling.

Share this post


Link to post
Share on other sites
On 9/2/2020 at 8:59 PM, usual user said:

Is the throttling triggered by thermal reasons or only by cpu load?

A tmon log will tell you if it is really thermal throttling.

Definitely thermal, the armbianmonitor benchmark was running the whole time.

Share this post


Link to post
Share on other sites
12 minutes ago, linuxjosef said:

Definitely thermal, the armbianmonitor benchmark was running the whole time.

AFAIK armbianmonitor does not log the cooling devices, hence my request for the tmon log.

Share this post


Link to post
Share on other sites
On 8/31/2020 at 6:36 AM, 5kft said:

could you run "screen", and run "sudo armbianmonitor -m" in one session, and "top" in another?  It'd be interesting to see what is going on in terms of processes, and gauge the load and see if throttling is indeed working on your board.

 

Kind of like the stability testing we did for H5 and memory timing...

 

openssl speed -multi 4

Puts a fair load on to the CPU...

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
0