halfa Posted May 27, 2021 Posted May 27, 2021 Posting here following what was recommended on twitter. After updating my helios64 earlier this week and rebooting to get the new kernel, I realized it was suspiciously silent. A quick check to sensor temps readings and physical check made me realize the fan were not spinning. After a quick read on the wiki, I checked fancontrol which was indeed failing: root@helios64:~ # systemctl status fancontrol.service ● fancontrol.service - fan speed regulator Loaded: loaded (/lib/systemd/system/fancontrol.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/fancontrol.service.d └─pid.conf Active: failed (Result: exit-code) since Fri 2021-05-28 00:08:13 CEST; 1min 42s ago Docs: man:fancontrol(8) man:pwmconfig(8) Process: 2495 ExecStartPre=/usr/sbin/fancontrol --check (code=exited, status=0/SUCCESS) Process: 2876 ExecStart=/usr/sbin/fancontrol (code=exited, status=1/FAILURE) Main PID: 2876 (code=exited, status=1/FAILURE) May 28 00:08:13 helios64 fancontrol[2876]: MINPWM=0 May 28 00:08:13 helios64 fancontrol[2876]: MAXPWM=255 May 28 00:08:13 helios64 fancontrol[2876]: AVERAGE=1 May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: At least one referenced file is missing. Either some required kernel May 28 00:08:13 helios64 fancontrol[2876]: modules haven't been loaded, or your configuration file is outdated. May 28 00:08:13 helios64 fancontrol[2876]: In the latter case, you should run pwmconfig again. May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Main process exited, code=exited, status=1/FAILURE May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Failed with result 'exit-code'. Basically fancontrol expect a device in /dev to read the sensors value from, and that device seems to be missing. After a bit of poking around and learning about udev, I managed to manually solve the issue by recreating the device symlink manually: /usr/bin/mkdir /dev/thermal-cpu/ ln -s /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input systemctl restart fancontrol.service systemctl status fancontrol.service Now digging more this issue happen because udev is not creating the symlink like it should for some reason. After reading the rule in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and a bit of udev documentation, I managed to find how to test it: root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] Reading rules file: /etc/udev/rules.d/90-helios64-hwmon-legacy.rules Reading rules file: /etc/udev/rules.d/90-helios64-ups.rules [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0 USEC_INITIALIZED=7544717 run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0 ' <-- something is wrong here, there is no target Unload module index Unloaded link configuration context. After spending a bit more time reading the udev rule, I realized that the second argument was empty because we don't match the ATTR{type}=="soc-thermal" condition. We can look up the types like this: root@helios64:~ # find /sys/ -name type | grep thermal /sys/devices/virtual/thermal/cooling_device1/type /sys/devices/virtual/thermal/thermal_zone0/type /sys/devices/virtual/thermal/cooling_device4/type /sys/devices/virtual/thermal/cooling_device2/type /sys/devices/virtual/thermal/thermal_zone1/type /sys/devices/virtual/thermal/cooling_device0/type /sys/devices/virtual/thermal/cooling_device3/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert1/type root@helios64:~ # cat /sys/devices/virtual/thermal/thermal_zone0/type cpu <-- we were expecting soc-thermal! and after rewriting the line with the new type, udev is happy again # Edit in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and add the following line after the original one ATTR{type}=="cpu", ENV{HWMON_PATH}="/sys%p/temp", ENV{HELIOS64_SYMLINK}="/dev/thermal-cpu/temp1_input", RUN+="/usr/bin/mkdir /dev/thermal-cpu/" root@helios64:~ # udevadm control --reload root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0/temp HELIOS64_SYMLINK=/dev/thermal-cpu/temp1_input USEC_INITIALIZED=7544717 run: '/usr/bin/mkdir /dev/thermal-cpu/' run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input' Unload module index Unloaded link configuration context. Apparently for some reason the device-tree changed upstream and the thermal type changed from soc-thermal to cpu? 3 Quote
halfa Posted May 31, 2021 Author Posted May 31, 2021 For anybody passing by, the issue is due to the fact that for some reason the armbian-bsp-cli-helios64 package for 21.05.2 (EDIT: clarify, 21.05.1 is fine as seen below) was build with the old udev rule (for 4.4 kernels): $ ls armbian-bsp-cli-helios64_21.05.1_arm64.deb\data.tar\.\etc\udev\rules.d\ 10-wifi-disable-powermanagement.rules 50-mali.rules 50-rk3399-vpu.rules 50-usb-realtek-net.rules 70-keep-usb-lan-as-eth1.rules 90-helios64-hwmon.rules 90-helios64-ups.rules $ ls armbian-bsp-cli-helios64_21.05.2_arm64.deb\data.tar\.\etc\udev\rules.d\ 10-wifi-disable-powermanagement.rules 50-mali.rules 50-rk3399-vpu.rules 50-usb-realtek-net.rules 70-keep-usb-lan-as-eth1.rules 90-helios64-hwmon-legacy.rules 90-helios64-ups.rules The content of the 90-helios64-hwmon.rules is indeed correct and match the 5.10.x kernel device tree: https://github.com/armbian/build/blob/master/packages/bsp/helios64/90-helios64-hwmon.rules I tried reversing the build system to find why the old file was used instead of the other, but the best I could find is # in config/sources/families/include/rockchip64_common.inc 395 ### Fancontrol tweaks 396 # copy hwmon rules to fix device mapping 397 if [[ $BRANCH == legacy ]]; then 398 install -m 644 $SRC/packages/bsp/helios64/90-helios64-hwmon-legacy.rules $destination/etc/udev/rules.d/ 399 else 400 install -m 644 $SRC/packages/bsp/helios64/90-helios64-hwmon.rules $destination/etc/udev/rules.d/ 401 fi 1 Quote
Heisath Posted June 1, 2021 Posted June 1, 2021 Just to clarify, was this a problem with 21.05.1 or IS it still a problem with 21.05.2? If it is still active problem, we should seek to do a Pullrequest to fully fix it in the buildsystem. EDIT: Seems to be correct in the buildsystem: https://github.com/armbian/build/blob/3b3d85e25c2ecde30df7b5274fc6f1b9c0299ea2/config/sources/families/include/rockchip64_common.inc#L395-L401 1 Quote
Heisath Posted June 2, 2021 Posted June 2, 2021 To confirm I checked with https://armbian.systemonachip.net/apt/pool/focal-utils/a/armbian-bsp-cli-helios64/ Which really shows that there is a wrong file in 21.05.2 (/etc/udev/rules.d/90-helios64-hwmon...), interestingly the nightly build from beta.armbian.com does have it right... @Igor pinging you here, as I am not familiar with the new packaging. Was this only an issue in one version and will be fixed automatically with next release/minor version? Or do we have to fix some packaging somewhere...? 0 Quote
Igor Posted June 2, 2021 Posted June 2, 2021 4 hours ago, Heisath said: pinging you here, as I am not familiar with the new packaging. Random behaviour is because there is no more per branch BSP. Decisions has to be refactored for runtime. I keep forgetting we still have legacy stuff here ... so this will complicate a bit https://armbian.atlassian.net/browse/AR-779 0 Quote
halfa Posted June 7, 2021 Author Posted June 7, 2021 Quote Just to clarify, was this a problem with 21.05.1 or IS it still a problem with 21.05.2? It is STILL an issue in the 21.05.2 package in the repo https://armbian.hosthatch.com/apt/pool/focal-utils/a/armbian-bsp-cli-helios64/armbian-bsp-cli-helios64_21.05.2_arm64.deb so anybody upgrading to 21.05.2 will get the old udev rule and the fancontrol issue 0 Quote
halfa Posted June 7, 2021 Author Posted June 7, 2021 One solution to this would be to merge both the old and the new rule into the same file (like I ended up doing above), but I would highly suggest that we package a new version of the bsp with the correct rule in a 21.05.3 version to avoid issues with non-spinning fans. Let me know if I can assist by any means. 1 Quote
snakekick Posted June 17, 2021 Posted June 17, 2021 Hi, a new version armbian-bsp-cli-helios64 (21.05.4) released today but still have the same error. ;(( 0 Quote
Igor Posted June 17, 2021 Posted June 17, 2021 34 minutes ago, snakekick said: Hi, a new version armbian-bsp-cli-helios64 (21.05.4) released today but still have the same error. ;(( Do you support the project at least this way? https://forum.armbian.com/subscriptions/ So you don't make additional expenses when asking for support you are far away from. Software development and support / bug fixing takes time. It is also very expensive since people needs to have a lot of knowledge which is highly paid and very desirable on the market. Here you expect this service for free. Well, then you have to wait with a partially broken system without complaining ... also you can fix it on your own. Or hire some to fix this for all of us. Why this would go on our private expense??? There are "1000 bugs and 1000 people" before this one and this update fixed some other bugs. We made few people happy, but not possible to make all happy. Bug was recorded to our system and its waiting for a free time slot. For our donation to you. A week, a month or years. Up to you. 0 Quote
Zac Posted June 17, 2021 Posted June 17, 2021 While you are right of course, for professional kind of support there's quite a lot of other alternatives which suit better. Donations to the project are essential, so that things can be done. But where I disagree a little, is that this issue causes the fans to stop. I find that to be a serious issue, it can cause hardware damage. I'm sure there is thermal throttling and auto-shutdown if the temperature reaches some thresholds, but it's never good to go into that area. So the "1000 bugs" before this one, I don't agree with it. Now to avoid any kind of unnecessary pressure, my suggestion to all users is: revert back to 21.05.01, until as solution is deployed in the latest release. 0 Quote
Igor Posted June 17, 2021 Posted June 17, 2021 24 minutes ago, Zac said: I find that to be a serious issue How about this way - "Sadly we ran out of money to fix things for this year. In reality already second day of the year.". But hey, this is open source. Anyone can fix things. 26 minutes ago, Zac said: I'm sure there is thermal throttling and auto-shutdown if the temperature reaches some thresholds It is. 26 minutes ago, Zac said: Now to avoid any kind of unnecessary pressure, my suggestion to all users is: revert back to 21.05.01, until as solution is deployed in the latest release. That would be some workaround but sadly we have no ability to effectively communicate such message. 0 Quote
Heisath Posted June 18, 2021 Posted June 18, 2021 Another temporary solution is already provided in the first post btw. Anyone struggling with this issue and only wants the fan to work again can just: On 5/28/2021 at 1:19 AM, halfa said: and after rewriting the line with the new type, udev is happy again # Edit in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and add the following line after the original one ATTR{type}=="cpu", ENV{HWMON_PATH}="/sys%p/temp", ENV{HELIOS64_SYMLINK}="/dev/thermal-cpu/temp1_input", RUN+="/usr/bin/mkdir /dev/thermal-cpu/" root@helios64:~ # udevadm control --reload root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0/temp HELIOS64_SYMLINK=/dev/thermal-cpu/temp1_input USEC_INITIALIZED=7544717 run: '/usr/bin/mkdir /dev/thermal-cpu/' run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input' Unload module index Unloaded link configuration context. 0 Quote
Igor Posted June 18, 2021 Posted June 18, 2021 15 hours ago, Zac said: Donations to the project are essential, so that things can be done. All three customers that are complaining here - start with an Angel monthly subscription and it will be quickly enough to cover expenses to fix this bug. Donations are reserved for free willing acts which btw don't cover electricity costs. I wouldn't call that as essential. 0 Quote
Heisath Posted June 20, 2021 Posted June 20, 2021 Fix for the problem by combining the hwmon rules into one file and adding it regardless of kernel branch: https://github.com/armbian/build/tree/helios64-udev-hwmon-fix Please review the changes and _test_ them. I do not have a helios64 so just done quickly. @halfa @gprovost 0 Quote
gershwin Posted June 24, 2021 Posted June 24, 2021 I'm having the same issue with a slightly different presentation. Fans don't run, which caused a couple of thermal shutdowns before i realised PRETTY_NAME="Armbian 21.05.4 Buster" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" output from systemctl status fancontrol.service is similar to yours @halfa but not identical the odd thing is that i don't have any /etc/udev/rules.d/90-helios64-hwmon-legacy.rules at all the path /etc/udev/rules.d/ has some rules, but no 90-helios64-hwmon-legacy.rules i've currently got a dead NAS (kernel panic every other boot) without fans, but i can't reinstall, because the latest image is still broken. Are any historic images available, i couldn't see anywhere to get these 0 Quote
Heisath Posted June 24, 2021 Posted June 24, 2021 Historic images are available here: https://archive.armbian.com/helios64/archive/ or https://mirrors.dotsrc.org/armbian-dl/helios64/archive/ You could also build an image yourself (maybe from this branch https://github.com/armbian/build/tree/helios64-udev-hwmon-fix) and report if it works. 0 Quote
halfa Posted June 24, 2021 Author Posted June 24, 2021 The 21.05.6 release fixed the regression in the package build process (couldn't find a related commit, maybe during the build targets reworks?), removing the "legacy" udev rule and adding the correct one. root@helios64:~ # apt info armbian-bsp-cli-helios64 Package: armbian-bsp-cli-helios64 Version: 21.05.6 Given that this is fixed, I don't know if there is a need to patch the upstream Heisath. I'm willing to test you're version of the merged udev rule but I don't have a legacy env. to properly test the other one. 0 Quote
Heisath Posted June 25, 2021 Posted June 25, 2021 Question is, is it really fixed? Or is it just by chance now always including the current udev rule? Also maybe on legacy now using the current file? I'd think so... I will leave my branch open for now. Maybe someone has the time & hardware to fully test. 0 Quote
gershwin Posted June 27, 2021 Posted June 27, 2021 I've reinstalled with Armbian 21.02.3 Buster with Linux 5.10.21-rockchip64. I didn't see any mention of a fix in release notes, and sadly, WFH, and with a newborn, I don't have time to reinstall a version where it's uncertain this is in it. 0 Quote
Igor Posted June 27, 2021 Posted June 27, 2021 7 hours ago, gershwin said: with a newborn, I don't have time to reinstall a version Congratulations You won't believe - I have two small kids, a wife, a cat (ok, this one is on low maintenance), a full time job and a full time project to maintain, I don't have time to test this even we are providing this software for you and I had this hardware. Alternative is to hire you help, but since you are not interested, this is not going to happen. 0 Quote
BipBip1981 Posted December 21, 2021 Posted December 21, 2021 Hello, My workaround or patch, maybe it bad, sad or unclear but it's KISS and seem to work ! Bye root@helios64:~# cat /etc/fancontrol # Helios64 PWM Fan Control Configuration # Temp source : /dev/thermal-cpu INTERVAL=10 #FCTEMPS=/dev/fan-p6/pwm1=/dev/thermal-cpu/temp1_input /dev/fan-p7/pwm1=/dev/thermal-cpu/temp1_input FCTEMPS=/dev/fan-p6/pwm1=/sys/class/thermal/thermal_zone0/temp /dev/fan-p7/pwm1=/sys/class/thermal/thermal_zone0/temp MINTEMP=/dev/fan-p6/pwm1=40 /dev/fan-p7/pwm1=40 MAXTEMP=/dev/fan-p6/pwm1=110 /dev/fan-p7/pwm1=110 MINSTART=/dev/fan-p6/pwm1=60 /dev/fan-p7/pwm1=60 MINSTOP=/dev/fan-p6/pwm1=40 /dev/fan-p7/pwm1=40 MINPWM=20 0 Quote
0xdnL Posted October 7 Posted October 7 Thanks alot @BipBip1981 Changing the values at FCTEMPS from `/dev/thermal-cpu/temp1_input` to `/sys/class/thermal/thermal_zone0/temp` fixed it. My dumb ass then updated the system via OVM and after a reboot the fan started spinning non-stop again, this time: root@helios64:~# systemctl status fancontrol.service ● fancontrol.service - fan speed regulator Loaded: loaded (/lib/systemd/system/fancontrol.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sun 2024-10-06 10:49:39 UTC; 6s ago Docs: man:fancontrol(8) man:pwmconfig(8) Process: 1653 ExecStartPre=/usr/sbin/fancontrol --check (code=exited, status=0/SUCCESS) Process: 1773 ExecStart=/usr/sbin/fancontrol (code=exited, status=1/FAILURE) Main PID: 1773 (code=exited, status=1/FAILURE) Oct 06 10:49:39 helios64 fancontrol[1773]: MINSTOP=40 Oct 06 10:49:39 helios64 fancontrol[1773]: MINPWM=0 Oct 06 10:49:39 helios64 fancontrol[1773]: MAXPWM=255 Oct 06 10:49:39 helios64 fancontrol[1773]: Error: file /dev/fan-p6/pwm1 doesn't exist Oct 06 10:49:39 helios64 fancontrol[1773]: Error: file /dev/fan-p7/pwm1 doesn't exist I somehow pieced the new value together via: https://community.clearlinux.org/t/fancontrol-and-pwmconfig-hwmon3-pwm1-and-hwmon3-fan1-input-not-found/7516/5 https://bbs.archlinux.org/viewtopic.php?id=243169 https://www.reddit.com/r/linuxquestions/comments/n7lwga/fancontrol_error_file_doesnt_exist_or_isnt/ Fan is silent now and fancontrol.service is active (running) with: root@helios64:~# cat fancontrol/fancontrol INTERVAL=10 FCTEMPS=/sys/devices/platform/p6-fan/hwmon/hwmon5/pwm1=/sys/class/thermal/thermal_zone0/temp /sys/devices/platform/p7-fan/hwmon/hwmon4/pwm1=/sys/class/thermal/thermal_zone1/temp MINTEMP=/sys/devices/platform/p6-fan/hwmon/hwmon5/pwm1=40 /sys/devices/platform/p7-fan/hwmon/hwmon4/pwm1=40 MAXTEMP=/sys/devices/platform/p6-fan/hwmon/hwmon5/pwm1=110 /sys/devices/platform/p7-fan/hwmon/hwmon4/pwm1=110 MINSTART=/sys/devices/platform/p6-fan/hwmon/hwmon5/pwm1=60 /sys/devices/platform/p7-fan/hwmon/hwmon4/pwm1=60 MINSTOP=/sys/devices/platform/p6-fan/hwmon/hwmon5/pwm1=40 /sys/devices/platform/p7-fan/hwmon/hwmon4/pwm1=40 MINPWM=20 This is working on: root@helios64:~# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 10 (buster) Release: 10 Codename: buster root@helios64:~# uname -r 6.6.47-current-rockchip64 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.