Mike C Posted September 14, 2017 Posted September 14, 2017 Just started using the thinker board, and got fed up with the OEM builds which brought me here. So I would like to thank everyone for all the hard work, Im sure it can't be easy fixing things blind! Cheers mates. As the Tinkerboard runs hot under load, i run a program that adjusts pwm to adjust fan speed. so I noticed right away when upgrading to 4.13.0 that both soc and gpu thermal zones had broken. I've tried 4.13.1 and 4.13.2 as well they are broken as well. 4.13.x all kick error messages into dmesg as well. root@Asus:/sys/class/thermal# uname -r 4.12.0-rockchip root@Asus:/sys/class/thermal# cat /sys/class/thermal/thermal_zone[1-2]/temp 32272 32727 root@Asus:/sys/class/thermal# uname -r 4.13.0-rockchip root@Asus:/sys/class/thermal# cat /sys/class/thermal/thermal_zone[0-1]/temp cat: /sys/class/thermal/thermal_zone0/temp: Invalid argument cat: /sys/class/thermal/thermal_zone1/temp: Invalid argument nothermal.log
TonyMac32 Posted September 14, 2017 Posted September 14, 2017 Let me take a look, probably some bindings changed. Thanks for pointing it out! rockchip,grf is still shown as an optional parameter in the 4.13 documentation. I'll have to dig deeper, maybe there are some conflicts due to changes in the rk3288 device tree. [edit] the rockchip,grf parameter is present in 4.12 Armbian dtsi, I will restore it in the 4.13 and test, then let you know.
TonyMac32 Posted September 15, 2017 Posted September 15, 2017 I will have to take a closer look, adding the grf got rid of the specific grf complaints, but did not solve the more serious errorsca. Hopefully I can get this straightened out quickly.
Mike C Posted September 15, 2017 Author Posted September 15, 2017 Cool thank you for the update mate. One thing that concerns me is the cpu throttling likely is not happening since I believe its kernel based. all thou the kernel seams to throttle at kind of low temps, anything over 70c, .Datasheet on the rk3288 states a max of 85c for operating temp, and absolute max of 125c. To start throttling 70c seams to be bit overly conservative. I think I saw a patch from the rockchip guys that lowered all throttling levels 5 degrees C, unsure why, might be a very good reason for it.
TonyMac32 Posted September 15, 2017 Posted September 15, 2017 They had the sampling rate of the tsadc set extremely low (every 5 seconds at one point). One thing I can comment on is a seeming lack of communication between some of the programmers over there, especially with the Tinker Board team. A lot of problems get solved via commits from 2-3 people simultaneously, and the solution is often disjointed or somewhat hackish in nature. Then a commit comes along by one of the "main guys" I guess that cleans up things a bit. As far as this goes, I'll have to go over the device tree and the drivers, I didn't notice this in the Dev branch before, but it is there, I may have missed it before. I've found where the error is being thrown in the driver, I just need to figure out what it's telling me.
tkaiser Posted September 16, 2017 Posted September 16, 2017 On 14.9.2017 at 4:56 PM, Mike C said: /sys/class/thermal# cat /sys/class/thermal/thermal_zone[1-2]/temp Which one is the right syfs node for CPU cores with mainline kernel? I thought the kernel you're using would follow the general rule to expose 'CPU temperature' as thermal_zone0 by default? @TonyMac32 or @Mike C: do you have the Tinkerboard running with 4.12 currently and can have a quick look?
TonyMac32 Posted September 16, 2017 Posted September 16, 2017 I can boot a 4.12 momentarily, I was debugging 4.13. I found what *I think* is the culprit, the gpu node was changed and somehow omitted the cooling-cells property. [edit] zone1 is cpu_thermal, zone2 is gpu_thermal The device tree has the tsadc channel 0 as a "reserve_thermal", with 1 and 2 cpu and gpu. 1
Myy Posted September 16, 2017 Posted September 16, 2017 On my MiQi board, thermal_zone0 is disabled, thermal_zone1 is enabled and so is the 2. Which gives : root@miqi:~# cat /sys/class/thermal/thermal_zone0/mode disabled root@miqi:~# cat /sys/class/thermal/thermal_zone1/mode enabled root@miqi:~# cat /sys/class/thermal/thermal_zone2/mode enabled root@miqi:~# cat /sys/class/thermal/thermal_zone0/temp cat: /sys/class/thermal/thermal_zone0/temp: Invalid argument root@miqi:~# cat /sys/class/thermal/thermal_zone1/temp 45000 root@miqi:~# cat /sys/class/thermal/thermal_zone2/temp 44090 Since the temperature sensor used in the login message from Armbian works, I never tried to play too much with the temperature sensors. I could try to find why the first sensor is disabled though.
TonyMac32 Posted September 16, 2017 Posted September 16, 2017 well, I honestly don't know what "reserve_thermal" is, so even if enabled.... what does it measure? So adding the cooling cells and power model fixed it, let me get everything uploaded and pushed to the repo.
tkaiser Posted September 16, 2017 Posted September 16, 2017 19 minutes ago, Myy said: I could try to find why the first sensor is disabled though. Not really necessary. It's just that we now have exactly two exceptions from the general rule (thermal_zone0): the smelly Allwinner A20 legacy kernel and surprisingly both kernel variants for RK3288 (the two S500 boards not counted since deprecated fortunately in the meantime). But honestly I don't really mind any more now that it's just two exceptions and not like a year ago only exceptions and no generic rule.
TonyMac32 Posted September 16, 2017 Posted September 16, 2017 Is it just the order the thermal zones are created based on the device tree? We could reorder them perhaps, of course that would break any hard-coded things that don't check the zones status or type... Maybe reach out and try to get it rearranged officially. I just swapped the order in the dtsi and it did what I expected. @Igor The issue is fixed.
Myy Posted September 16, 2017 Posted September 16, 2017 I don't know if that's the order, or the number associated with the tsadc entry ( thermal-sensors = <&tsadc 0>; ). I'll try to ask #linux-rockchip , see if they have a clue about this reserved area.
tkaiser Posted September 16, 2017 Posted September 16, 2017 5 minutes ago, Myy said: I'll try to ask #linux-rockchip , see if they have a clue about this reserved area. And please ask them to consider being compatible to the rest of the (Linux) world using thermal_zone0. Most probably still using thermal_zone1 is for backwards compatiblity since they had it that way on their 4.4 kernel already (a tribute to badly written software expecting things at hardcoded locations). But since every software/script out there naively expects CPU temp available as thermal_zone0 it would be better to become compatible to this (even more a tribute to badly written software expecting things at hardcoded locations ) 1
Myy Posted September 16, 2017 Posted September 16, 2017 Well, after a quick test, it seems that the number after tsadc seems to determine the probe to check, but does not affect the numbering. Meaning that if you put tsadc 1 on the reserve_thermal and tsadc 0 on the cpu : - You won't be able to read the CPU temperature through thermal_zone1, even though it's enabled - but you'll be able to the CPU temperature through thermal_zone0, even though it's disabled and named reserve_thermal.
TonyMac32 Posted September 16, 2017 Posted September 16, 2017 1 hour ago, Myy said: but does not affect the numbering. Right. It's literally the order they are entered into the thermal-zones list. I moved Reserve_thermal to the end of the list and it was thermal_zone2 in the kernel. Cpu was then themal_zone0
Myy Posted September 16, 2017 Posted September 16, 2017 I asked the question, I got informed that I should not hardcore these numbers, but didn't get anything on a fool proof method to get the CPU temperature. I'll reiterate another day, with less typos and a better grammar. https://irclog.whitequark.org/linux-rockchip/2017-09-16 The discussion then diverted on potential future Vulkan support.
TonyMac32 Posted September 16, 2017 Posted September 16, 2017 Well, I'm not saying we hardcode anything, but if I just stick the reserve_thermal entry at the end, at least it makes sense. Anyone hardcoding beware, obviously you should probe the type and status. And, if everyone is following mmind's advice, it shouldn't cause any disturbance in the force.
zador.blood.stained Posted September 16, 2017 Posted September 16, 2017 2 hours ago, tkaiser said: And please ask them to consider being compatible to the rest of the (Linux) world using thermal_zone0. It may be better to iterate through all /sys/class/thermal/thermal_zoneX properties and check properties ("type", "device" symlink) to find the correct thermal sensor.
Mike C Posted September 17, 2017 Author Posted September 17, 2017 FYI: The Asus guys changed this in TinkerOS 2.0.1 Beta, I'm willing to bet is was pretty hackish lol. Their git repository has the change to zone0, as I compiled from their sources and it was moved to Zone0 in them as well. https://github.com/TinkerBoard/debian_kernel Quote TinkerOS_Debian V2.0.1 (Beta version) TinkerOS default username is “linaro”, password is “linaro” 1. Preload the package gvfs-backends. ...15. Change thermal zone number. (0: SoC, 1: GPU)
TonyMac32 Posted September 17, 2017 Posted September 17, 2017 Good catch. I'll go ahead and swap the device tree entries There's nothing hackish about this one, they get numbered in the order they are declared.
TonyMac32 Posted September 18, 2017 Posted September 18, 2017 Just pushed re-organized thermals to repo, cpu_thermal is zone0, gpu_thermal is zone1. This is for dev, next, and default. All I did was move the "reserve_thermal" entry to the end of the list of three. @tkaiser 2
Igor Posted September 21, 2017 Posted September 21, 2017 Images at download section are updated with 4.13.3 and 4.4.88 2
Recommended Posts