Orange Pi Zero LTS Incorrect Temps Reported


Recommended Posts

I do not need the HDMI as well, but it is a nice to have and simple a bit easier to plug in instead of three UART connectors :P

 

For the difference between H2+ and H3: https://linux-sunxi.org/H3

 

Quote

LTS states: " low running temperature and low power consumption. 

Xulong also states they have proper software support which they do not have so I do not relay on whatever they claim about anything.

 

Quote

no broken by-design WiFi chip -> OPi One does not have WiFi imho

Yes, the OPi1 does not have WiFi, though I do not consider this as disadvantage in comparison to the OPi zero as lots of people have issues with it.

 

Both have ethernet which I will always prefer over WiFi especially if you need reliability for your project like running a PiHole DNS in your LAN.

Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

@Craig St George - it is confusing because both the LTS and non-LTS boards have the same screenprinting on the PCB (Orange Pi Zero 1.4 or 1.5). The temperature read-out is correct for the non-LTS boards, but broken on the LTS boards. Looking at /etc/armbianmonitor/datasources/soctemp, it seems to be an offset problem. The LTS boards under-read by somewhere between 30 and 40 degrees C.

 

I'm not sure what the correct channel to report this is, so the Armbian devs can look into it. Not sure if posting in this forum gets this issue in front of the right eyes or not.

Link to post
Share on other sites

I spent some time investigating this, I think it is a hardware/production issue. There is a calibration value in the eFuse, set during chip production. That value is currently not used in mainline, but in the affected hardware, it does not correct the offset anyway. In my good boards, it only caused a small change that looks reasonable. But the problematic boards are off by 20-30 degrees.

I wrote a small python3 script to access the hardware directly, it will print a dump of the THS registers, interpret the raw value into a temperature with all the formulas that were published one time or another, and apply the eFuse factory-calibration value. To see the effect, wait a few seconds and run again.

import time
import os
import mmap
from sys import exit

dev_mem_rw = os.open("/dev/mem", os.O_RDWR | os.O_SYNC)

#send reset pulse to the THS subsystem
if False:
	ccu_base = 0x01C20000
	mmap_base = int(ccu_base/mmap.PAGESIZE)*mmap.PAGESIZE
	ccu_registers = mmap.mmap(dev_mem_rw, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_WRITE, offset=mmap_base)
	ccu_offset = ccu_base - mmap_base
	rst_reg3 = ccu_offset + 0x2D0 # bus soft rst reg3
	current_state = ccu_registers[rst_reg3+1]
	ccu_registers[rst_reg3+1] = current_state & 0xFE
	ccu_registers[rst_reg3+1] = current_state
	print("Reset THS peripheral")


ths_base = 0x01C25000
mmap_base = int(ths_base/mmap.PAGESIZE)*mmap.PAGESIZE
ths_offset = ths_base - mmap_base
ths_ctrl_reg0 = ths_offset + 0x00  # ctrl reg0
ths_ctrl_reg1 = ths_offset + 0x04  # ctrl reg1
ths_cdat_reg = ths_offset + 0x14  # ADC cal data
ths_ctrl_reg2 = ths_offset + 0x40  # ctrl reg2
ths_int_ctrl = ths_offset + 0x44  # interrupt ctrl
ths_stat_reg = ths_offset + 0x48  # status reg
ths_alarm_ctrl = ths_offset + 0x50  # alarm ctrl
ths_shdn_ctrl = ths_offset + 0x60  # shutdown ctrl
ths_filter_ctrl = ths_offset + 0x70  # filter reg
ths_cdata_reg = ths_offset + 0x74  # calibration data register
ths_data_reg = ths_offset + 0x80  # thermal sensor data register

ths_registers = mmap.mmap(dev_mem_rw, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_WRITE, offset=mmap_base)

regs = { 
	"ctrl0": ths_ctrl_reg0, 
	"ctrl1": ths_ctrl_reg1, 
	"cdat": ths_cdat_reg, 
	"ctrl2": ths_ctrl_reg2, 
	"int_ctrl": ths_int_ctrl,
	"stat": ths_stat_reg, 
	"alarm": ths_alarm_ctrl, 
	"shdn": ths_shdn_ctrl, 
	"filter": ths_filter_ctrl, 
	"cdata": ths_cdata_reg, 
	"data": ths_data_reg }
	
def dmp():
	for name, r in regs.items():
		b = ths_registers[r:r+4]
		x = int.from_bytes(b, byteorder='little')
		print(f"{name:10}{b[3]:02X} {b[2]:02X} {b[1]:02X} {b[0]:02X}\t{x}")
				
adc_data = ths_registers[ths_data_reg:ths_data_reg+2]
adc_value = int.from_bytes(adc_data, byteorder='little')
print(f"ADC value: {adc_value}")

t_ds = (adc_value-2794)/(-14.882)
print(f"H2/H3 datasheet formula: {t_ds:.1f}C")

t_h5 = -0.1191 * adc_value + 223
print(f"H5 datasheet formula: {t_h5:.1f}C")

t_bsp1 = -0.1180 * adc_value + 256
print(f"BSP v4.9 formula: {t_bsp1:.1f}C")

t_bsp2 = -0.1211 * adc_value + 217
print(f"mainline & xunlong kernel: {t_bsp2:.1f}C")

with open("/sys/devices/virtual/thermal/thermal_zone0/temp", "r") as f:
	t_kernel = int(f.read()) / 1000.0
	print(f"Running Kernel: {t_kernel:.1f}C")
	

print("\nTHS Registers:")
dmp()

cal_value = int.from_bytes(ths_registers[ths_cdata_reg:ths_cdata_reg+4], "little")
print(f"\nCurrent THS calibration value: 0x{cal_value:3x}")
with open("/sys/bus/nvmem/devices/sunxi-sid0/nvmem", "rb") as f:
	f.seek(0x34)
	fuse_cal_value = int.from_bytes(f.read(4), "little")
	print(f"EFuse THS calibration value: 0x{fuse_cal_value:3x}")
	if cal_value != fuse_cal_value:
		ths_registers[ths_cdata_reg:ths_cdata_reg+4] = fuse_cal_value.to_bytes(4, "little")
		delta = (fuse_cal_value-cal_value)*0.1211
		print(f"Applied calibration value from EFuse, this causes a change of {delta:.1f} deg C")


#ths_registers[ths_ctrl_reg0:ths_ctrl_reg0+4] = (0xFF).to_bytes(4, "little")
# 16x averaging
#ths_registers[ths_filter_ctrl:ths_filter_ctrl+4] = (0x07).to_bytes(4, "little")
# ADC calibration, seems to do nothing?
#ths_registers[ths_ctrl_reg1:ths_ctrl_reg1+4] = ((1 << 17) | (0 << 20)| (0 << 21)).to_bytes(4, "little")

# stop
#ths_registers[ths_ctrl_reg2:ths_ctrl_reg2+4] = (4128769-1).to_bytes(4, "little")
# start
#ths_registers[ths_ctrl_reg2:ths_ctrl_reg2+4] = (4128769).to_bytes(4, "little")

You'll have to run it as root because it accesses the memory directly.
H2+ datasheet: http://wiki.friendlyarm.com/wiki/images/0/08/Allwinner_H2%2B_Datasheet_V1.2.pdf (THS is on page 255)
I left in some stuff in the script to control the hardware, if anyone wants to play with it without messing with the kernel


Good board, SoC temperature measured with thermocouple at 46°C: 

ADC value: 1414
H2/H3 datasheet formula: 92.7C
H5 datasheet formula: 54.6C
BSP v4.9 formula: 89.1C
mainline & xunlong kernel: 45.8C
Running Kernel: 45.6C

THS Registers:
ctrl0     00 00 00 FF   255
ctrl1     00 00 00 00   0
cdat      00 00 00 00   0
ctrl2     00 3F 00 01   4128769
int_ctrl  00 00 71 00   28928
stat      00 00 00 00   0
alarm     05 A0 06 84   94373508
shdn      04 E9 00 00   82378752
filter    00 00 00 06   6
cdata     00 00 08 00   2048
data      00 00 05 86   1414

Current THS calibration value: 0x800
EFuse THS calibration value: 0x817
Applied calibration value from EFuse, this causes a change of 2.8 deg C

Affected LTS board, SoC temperature on top: 47°C:

ADC value: 1626
H2/H3 datasheet formula: 78.5C
H5 datasheet formula: 29.3C
BSP v4.9 formula: 64.1C
mainline & xunlong kernel: 20.1C
Running Kernel: 20.0C

THS Registers:
ctrl0     00 00 00 FF   255
ctrl1     00 00 00 00   0
cdat      00 00 00 00   0
ctrl2     00 3F 00 01   4128769
int_ctrl  00 00 71 00   28928
stat      00 00 00 00   0
alarm     05 A0 06 84   94373508
shdn      04 E9 00 00   82378752
filter    00 00 00 06   6
cdata     00 00 08 00   2048
data      00 00 06 5A   1626

Current THS calibration value: 0x800
EFuse THS calibration value: 0x7fb
Applied calibration value from EFuse, this causes a change of -0.6 deg C

 

Link to post
Share on other sites

@yoq - this is extremely interesting - thanks for your work on this.

 

It seems that there is an offset of about 27 degrees C that needs to be applied on the LTS boards. Of course, I guess we don't know the slope is right either, but from my own experiments it seems that the error is simply an offset.

 

The next problem is that I'm not sure how we can detect that the OPi Zero board we're running on is an LTS board, and adjust the offset parameter accordingly. All the boards return a HW string of 'Xunlong Orange Pi Zero' whether they are LTS or not. Do you know how/if it's possible to detect this in software, and then adjust the returned offset? I suppose this would need to be implemented in sun8i_thermal.c.

 

-Adrian

Link to post
Share on other sites

Interesting. Now what would be nice is if we could set an offset of 27 degrees C at runtime.

 

I note that there is: /sys/devices/virtual/thermal/thermal_zone0/offset

 

However something like this seems to have no effect: echo 27000 > /sys/devices/virtual/thermal/thermal_zone0/offset; cat /sys/devices/virtual/thermal/thermal_zone0/temp

 

Any ideas what the correct syntax is and/or if writing to offset has any effect?

 

-Adrian

Link to post
Share on other sites

What's the correct reporting to channel to get this into the queue to get worked around in Armbian? I understand that the "root cause" is probably on the board side, but we do have a number that increases monotonically with temperature and should be usable once we figure out the slope and offset.

 

Great for everyone's feedback on this issue here, but it isn't clear if this forum gets the issue in front of the right eyes to be addressed in a future version of Armbian. Does anyone know how this process works?

Link to post
Share on other sites
On 2/4/2020 at 7:11 PM, alexisfrjp said:

An offset isn't enough.

 

100% load on the 4 threads:

$ cat /sys/devices/virtual/thermal/thermal_zone0/temp
73618

 

Alexis - this looks to be like a "normal" (non-LTS) board. I think your CPU temperature really is around 73 degrees, which sounds normal to me for 100% load on 4 cores.

 

For me, on the LTS board, in the same scenario, I get a reading around 40000 (i.e. 40 degrees C). Unless I've misunderstood, I think an offset is (at least roughly) enough.

Link to post
Share on other sites
On 12/30/2019 at 9:40 PM, yoq said:

SoC temperature measured with thermocouple

Measuring SoC temperature with a thermocouple gives a general idea of the temperature. However the silicon temperature inside the SoC's plastic casing will be slightly higher and that's our measurement goal.

The temperature sensor is incorporated inside the chip so if it works for non-LTS boards correctly setting the formulas in the firmware is the right way to go! Factory calibration is less prone to be wrong so I wouldn't loose too much time on it.

Link to post
Share on other sites
2 hours ago, jimg said:

I installed Orange Pi's version of Ubuntu Bionic server with kernel 5.34.27 mentioned in the tweet refered to by @gounthar above on a Orange Pi Zero LTS V1.5 board, and it has indeed fixed the temperature reporting problem.  The onboard wifi is working flawlessly, too.

I guess they rebranded an armbian image which includes alternative drivers for xradio and added the temperature fix....

Link to post
Share on other sites
On 7/14/2020 at 9:20 AM, Werner said:

I guess they rebranded an armbian image which includes alternative drivers for xradio and added the temperature fix....

It looks that way to me.  It has 'orangepi-config' and 'orangepimonitor' commands that look and behave exactly like the Armbian equivalents.  I'd feel a little more comfortable downloading the images from Armbian's torrents, though, than a server in China....

Link to post
Share on other sites

I'm having this issue too, is there any fix for latest Armbian?

 

My Armbian version:

 

BOARD=orangepizero
BOARD_NAME="Orange Pi Zero"
BOARDFAMILY=sun8i
BUILD_REPOSITORY_URL=https://github.com/armbian/build
BUILD_REPOSITORY_COMMIT=869a89d6-dirty
DISTRIBUTION_CODENAME=focal
DISTRIBUTION_STATUS=supported
VERSION=20.05.4
LINUXFAMILY=sunxi
BRANCH=current
ARCH=arm
IMAGE_TYPE=stable
BOARD_TYPE=conf
INITRD_ARCH=arm
KERNEL_IMAGE_TYPE=Image
 

Link to post
Share on other sites

I decompiled sun8i_thermal.ko, not sure what I was expecting...
This is the entirety of their "fix" compared to stock armbian: https://gist.github.com/dbeinder/6c4dac8df91fb4b1b1537bafa8136065/revisions
Yep, that's just a +30C offset shoved straight into the function, couldn't even be bothered with putting it into the struct for H2/H3, never mind chip detection...
> "BREAKING: Our engineer has solved the incorrect temperature Report about Zero LTS boards"

Good job, Xunlong, nice one

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...