Jump to content

Tinker board thermals broken post 4.12.0


Mike C

Recommended Posts

Just started using the thinker board, and got fed up with the OEM builds which brought me here. So I would like to thank everyone for all the hard work, Im sure it can't be easy fixing things blind! Cheers mates.

 

As the Tinkerboard runs hot under load, i run a program that adjusts pwm to adjust fan speed. so I noticed right away when upgrading to 4.13.0 that both soc and gpu thermal zones had broken. I've tried 4.13.1 and 4.13.2 as well they are broken as well. 4.13.x all kick error messages into dmesg as well.

 

root@Asus:/sys/class/thermal# uname -r
4.12.0-rockchip
root@Asus:/sys/class/thermal# cat /sys/class/thermal/thermal_zone[1-2]/temp
32272
32727

root@Asus:/sys/class/thermal# uname -r
4.13.0-rockchip
root@Asus:/sys/class/thermal# cat /sys/class/thermal/thermal_zone[0-1]/temp
cat: /sys/class/thermal/thermal_zone0/temp: Invalid argument
cat: /sys/class/thermal/thermal_zone1/temp: Invalid argument

nothermal.log

Link to comment
Share on other sites

Let me take a look, probably some bindings changed.  Thanks for pointing it out!

 

rockchip,grf is still shown as an optional parameter in the 4.13 documentation.  I'll have to dig deeper, maybe there are some conflicts due to changes in the rk3288 device tree.

 

[edit] the rockchip,grf parameter is present in 4.12 Armbian dtsi, I will restore it in the 4.13 and test, then let you know.

Link to comment
Share on other sites

Cool thank you for the update mate. One thing that concerns me is the cpu throttling likely is not happening since I believe its kernel based. all thou the kernel seams to throttle at kind of low temps, anything over 70c, .Datasheet on the rk3288 states a max of 85c for operating temp, and absolute max of 125c. To start throttling 70c seams to be bit overly conservative. I think I saw a patch from the rockchip guys that lowered all throttling levels 5 degrees C, unsure why, might be a very good reason for it.

Link to comment
Share on other sites

They had the sampling rate of the tsadc set extremely low (every 5 seconds at one point).  One thing I can comment on is a seeming lack of communication between some of the programmers over there, especially with the Tinker Board team.  A lot of problems get solved via commits from 2-3 people simultaneously, and the solution is often disjointed or somewhat hackish in nature.  Then a commit comes along by one of the "main guys" I guess that cleans up things a bit.  

As far as this goes, I'll have to go over the device tree and the drivers, I didn't notice this in the Dev branch before, but it is there, I may have missed it before.  I've found where the error is being thrown in the driver, I just need to figure out what it's telling me.

Link to comment
Share on other sites

On 14.9.2017 at 4:56 PM, Mike C said:

/sys/class/thermal# cat /sys/class/thermal/thermal_zone[1-2]/temp

 

Which one is the right syfs node for CPU cores with mainline kernel? I thought the kernel you're using would follow the general rule to expose 'CPU temperature' as thermal_zone0 by default?

 

@TonyMac32 or @Mike C: do you have the Tinkerboard running with 4.12 currently and can have a quick look?

Link to comment
Share on other sites

I can boot a 4.12 momentarily, I was debugging 4.13.  I found what *I think* is the culprit, the gpu node was changed and somehow omitted the cooling-cells property.

 

 

[edit]  zone1 is cpu_thermal, zone2 is gpu_thermal 

 

The device tree has the tsadc channel 0 as a "reserve_thermal", with 1 and 2 cpu and gpu. 

 

Link to comment
Share on other sites

On my MiQi board, thermal_zone0 is disabled, thermal_zone1 is enabled and so is the 2.

 

Which gives :

root@miqi:~# cat /sys/class/thermal/thermal_zone0/mode
disabled
root@miqi:~# cat /sys/class/thermal/thermal_zone1/mode
enabled
root@miqi:~# cat /sys/class/thermal/thermal_zone2/mode
enabled
root@miqi:~# cat /sys/class/thermal/thermal_zone0/temp
cat: /sys/class/thermal/thermal_zone0/temp: Invalid argument
root@miqi:~# cat /sys/class/thermal/thermal_zone1/temp
45000
root@miqi:~# cat /sys/class/thermal/thermal_zone2/temp
44090

Since the temperature sensor used in the login message from Armbian works, I never tried to play too much with the temperature sensors. I could try to find why the first sensor is disabled though.

Link to comment
Share on other sites

19 minutes ago, Myy said:

I could try to find why the first sensor is disabled though.

 

Not really necessary. It's just that we now have exactly two exceptions from the general rule (thermal_zone0): the smelly Allwinner A20 legacy kernel and surprisingly both kernel variants for RK3288 (the two S500 boards not counted since deprecated fortunately in the meantime). But honestly I don't really mind any more now that it's just two exceptions and not like a year ago only exceptions and no generic rule.

Link to comment
Share on other sites

Is it just the order the thermal zones are created based on the device tree?  We could reorder them perhaps, of course that would break any hard-coded things that don't check the zones status or type...

 

Maybe reach out and try to get it rearranged officially.  I just swapped the order in the dtsi and it did what I expected.

 

@Igor  The issue is fixed.

Link to comment
Share on other sites

5 minutes ago, Myy said:

I'll try to ask #linux-rockchip , see if they have a clue about this reserved area.

 

And please ask them to consider being compatible to the rest of the (Linux) world using thermal_zone0. Most probably still using thermal_zone1 is for backwards compatiblity since they had it that way on their 4.4 kernel already (a tribute to badly written software expecting things at hardcoded locations). But since every software/script out there naively expects CPU temp available as thermal_zone0 it would be better to become compatible to this (even more a tribute to badly written software expecting things at hardcoded locations ;) )

Link to comment
Share on other sites

Well, after a quick test, it seems that the number after tsadc seems to determine the probe to check, but does not affect the numbering.

 

Meaning that if you put tsadc 1 on the reserve_thermal and tsadc 0 on the cpu :

- You won't be able to read the CPU temperature through thermal_zone1, even though it's enabled

- but you'll be able to the CPU temperature through thermal_zone0, even though it's disabled and named reserve_thermal.

Link to comment
Share on other sites

1 hour ago, Myy said:

but does not affect the numbering.

 

Right.  It's literally the order they are entered into the thermal-zones list.  I moved Reserve_thermal to the end of the list and it was thermal_zone2 in the kernel.  Cpu was then themal_zone0

Link to comment
Share on other sites

:D

 

Well, I'm not saying we hardcode anything, but if I just stick the reserve_thermal entry at the end, at least it makes sense.  Anyone hardcoding beware, obviously you should probe the type and status.

 

And, if everyone is following mmind's advice, it shouldn't cause any disturbance in the force.

Link to comment
Share on other sites

FYI:

The Asus guys changed this in TinkerOS 2.0.1 Beta, I'm willing to bet is was pretty hackish lol. Their git repository has the change to zone0, as I compiled from their sources and it was moved to Zone0 in them as well.  https://github.com/TinkerBoard/debian_kernel

 

Quote

TinkerOS_Debian V2.0.1 (Beta version)
TinkerOS default username is “linaro”, password is “linaro”
1. Preload the package gvfs-backends.
...
15. Change thermal zone number. (0: SoC, 1: GPU)

 

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines