halfa 5 Posted May 27 Share Posted May 27 Posting here following what was recommended on twitter. After updating my helios64 earlier this week and rebooting to get the new kernel, I realized it was suspiciously silent. A quick check to sensor temps readings and physical check made me realize the fan were not spinning. After a quick read on the wiki, I checked fancontrol which was indeed failing: root@helios64:~ # systemctl status fancontrol.service ● fancontrol.service - fan speed regulator Loaded: loaded (/lib/systemd/system/fancontrol.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/fancontrol.service.d └─pid.conf Active: failed (Result: exit-code) since Fri 2021-05-28 00:08:13 CEST; 1min 42s ago Docs: man:fancontrol(8) man:pwmconfig(8) Process: 2495 ExecStartPre=/usr/sbin/fancontrol --check (code=exited, status=0/SUCCESS) Process: 2876 ExecStart=/usr/sbin/fancontrol (code=exited, status=1/FAILURE) Main PID: 2876 (code=exited, status=1/FAILURE) May 28 00:08:13 helios64 fancontrol[2876]: MINPWM=0 May 28 00:08:13 helios64 fancontrol[2876]: MAXPWM=255 May 28 00:08:13 helios64 fancontrol[2876]: AVERAGE=1 May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: Error: file /dev/thermal-cpu/temp1_input doesn't exist May 28 00:08:13 helios64 fancontrol[2876]: At least one referenced file is missing. Either some required kernel May 28 00:08:13 helios64 fancontrol[2876]: modules haven't been loaded, or your configuration file is outdated. May 28 00:08:13 helios64 fancontrol[2876]: In the latter case, you should run pwmconfig again. May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Main process exited, code=exited, status=1/FAILURE May 28 00:08:13 helios64 systemd[1]: fancontrol.service: Failed with result 'exit-code'. Basically fancontrol expect a device in /dev to read the sensors value from, and that device seems to be missing. After a bit of poking around and learning about udev, I managed to manually solve the issue by recreating the device symlink manually: /usr/bin/mkdir /dev/thermal-cpu/ ln -s /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input systemctl restart fancontrol.service systemctl status fancontrol.service Now digging more this issue happen because udev is not creating the symlink like it should for some reason. After reading the rule in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and a bit of udev documentation, I managed to find how to test it: root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] Reading rules file: /etc/udev/rules.d/90-helios64-hwmon-legacy.rules Reading rules file: /etc/udev/rules.d/90-helios64-ups.rules [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0 USEC_INITIALIZED=7544717 run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0 ' <-- something is wrong here, there is no target Unload module index Unloaded link configuration context. After spending a bit more time reading the udev rule, I realized that the second argument was empty because we don't match the ATTR{type}=="soc-thermal" condition. We can look up the types like this: root@helios64:~ # find /sys/ -name type | grep thermal /sys/devices/virtual/thermal/cooling_device1/type /sys/devices/virtual/thermal/thermal_zone0/type /sys/devices/virtual/thermal/cooling_device4/type /sys/devices/virtual/thermal/cooling_device2/type /sys/devices/virtual/thermal/thermal_zone1/type /sys/devices/virtual/thermal/cooling_device0/type /sys/devices/virtual/thermal/cooling_device3/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/gpu/trips/gpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_crit/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert0/type /sys/firmware/devicetree/base/thermal-zones/cpu/trips/cpu_alert1/type root@helios64:~ # cat /sys/devices/virtual/thermal/thermal_zone0/type cpu <-- we were expecting soc-thermal! and after rewriting the line with the new type, udev is happy again # Edit in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and add the following line after the original one ATTR{type}=="cpu", ENV{HWMON_PATH}="/sys%p/temp", ENV{HELIOS64_SYMLINK}="/dev/thermal-cpu/temp1_input", RUN+="/usr/bin/mkdir /dev/thermal-cpu/" root@helios64:~ # udevadm control --reload root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0/temp HELIOS64_SYMLINK=/dev/thermal-cpu/temp1_input USEC_INITIALIZED=7544717 run: '/usr/bin/mkdir /dev/thermal-cpu/' run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input' Unload module index Unloaded link configuration context. Apparently for some reason the device-tree changed upstream and the thermal type changed from soc-thermal to cpu? 3 Link to post Share on other sites More sharing options...
halfa 5 Posted May 31 Author Share Posted May 31 For anybody passing by, the issue is due to the fact that for some reason the armbian-bsp-cli-helios64 package for 21.05.2 (EDIT: clarify, 21.05.1 is fine as seen below) was build with the old udev rule (for 4.4 kernels): $ ls armbian-bsp-cli-helios64_21.05.1_arm64.deb\data.tar\.\etc\udev\rules.d\ 10-wifi-disable-powermanagement.rules 50-mali.rules 50-rk3399-vpu.rules 50-usb-realtek-net.rules 70-keep-usb-lan-as-eth1.rules 90-helios64-hwmon.rules 90-helios64-ups.rules $ ls armbian-bsp-cli-helios64_21.05.2_arm64.deb\data.tar\.\etc\udev\rules.d\ 10-wifi-disable-powermanagement.rules 50-mali.rules 50-rk3399-vpu.rules 50-usb-realtek-net.rules 70-keep-usb-lan-as-eth1.rules 90-helios64-hwmon-legacy.rules 90-helios64-ups.rules The content of the 90-helios64-hwmon.rules is indeed correct and match the 5.10.x kernel device tree: https://github.com/armbian/build/blob/master/packages/bsp/helios64/90-helios64-hwmon.rules I tried reversing the build system to find why the old file was used instead of the other, but the best I could find is # in config/sources/families/include/rockchip64_common.inc 395 ### Fancontrol tweaks 396 # copy hwmon rules to fix device mapping 397 if [[ $BRANCH == legacy ]]; then 398 install -m 644 $SRC/packages/bsp/helios64/90-helios64-hwmon-legacy.rules $destination/etc/udev/rules.d/ 399 else 400 install -m 644 $SRC/packages/bsp/helios64/90-helios64-hwmon.rules $destination/etc/udev/rules.d/ 401 fi 1 Link to post Share on other sites More sharing options...
Heisath 94 Posted June 1 Share Posted June 1 Just to clarify, was this a problem with 21.05.1 or IS it still a problem with 21.05.2? If it is still active problem, we should seek to do a Pullrequest to fully fix it in the buildsystem. EDIT: Seems to be correct in the buildsystem: https://github.com/armbian/build/blob/3b3d85e25c2ecde30df7b5274fc6f1b9c0299ea2/config/sources/families/include/rockchip64_common.inc#L395-L401 1 Link to post Share on other sites More sharing options...
Heisath 94 Posted June 2 Share Posted June 2 To confirm I checked with https://armbian.systemonachip.net/apt/pool/focal-utils/a/armbian-bsp-cli-helios64/ Which really shows that there is a wrong file in 21.05.2 (/etc/udev/rules.d/90-helios64-hwmon...), interestingly the nightly build from beta.armbian.com does have it right... @Igor pinging you here, as I am not familiar with the new packaging. Was this only an issue in one version and will be fixed automatically with next release/minor version? Or do we have to fix some packaging somewhere...? Link to post Share on other sites More sharing options...
Igor 2301 Posted June 2 Share Posted June 2 4 hours ago, Heisath said: pinging you here, as I am not familiar with the new packaging. Random behaviour is because there is no more per branch BSP. Decisions has to be refactored for runtime. I keep forgetting we still have legacy stuff here ... so this will complicate a bit https://armbian.atlassian.net/browse/AR-779 Link to post Share on other sites More sharing options...
halfa 5 Posted June 7 Author Share Posted June 7 Quote Just to clarify, was this a problem with 21.05.1 or IS it still a problem with 21.05.2? It is STILL an issue in the 21.05.2 package in the repo https://armbian.hosthatch.com/apt/pool/focal-utils/a/armbian-bsp-cli-helios64/armbian-bsp-cli-helios64_21.05.2_arm64.deb so anybody upgrading to 21.05.2 will get the old udev rule and the fancontrol issue Link to post Share on other sites More sharing options...
halfa 5 Posted June 7 Author Share Posted June 7 One solution to this would be to merge both the old and the new rule into the same file (like I ended up doing above), but I would highly suggest that we package a new version of the bsp with the correct rule in a 21.05.3 version to avoid issues with non-spinning fans. Let me know if I can assist by any means. 1 Link to post Share on other sites More sharing options...
snakekick 0 Posted June 17 Share Posted June 17 Hi, a new version armbian-bsp-cli-helios64 (21.05.4) released today but still have the same error. ;(( Link to post Share on other sites More sharing options...
Igor 2301 Posted June 17 Share Posted June 17 34 minutes ago, snakekick said: Hi, a new version armbian-bsp-cli-helios64 (21.05.4) released today but still have the same error. ;(( Do you support the project at least this way? https://forum.armbian.com/subscriptions/ So you don't make additional expenses when asking for support you are far away from. Software development and support / bug fixing takes time. It is also very expensive since people needs to have a lot of knowledge which is highly paid and very desirable on the market. Here you expect this service for free. Well, then you have to wait with a partially broken system without complaining ... also you can fix it on your own. Or hire some to fix this for all of us. Why this would go on our private expense??? There are "1000 bugs and 1000 people" before this one and this update fixed some other bugs. We made few people happy, but not possible to make all happy. Bug was recorded to our system and its waiting for a free time slot. For our donation to you. A week, a month or years. Up to you. Link to post Share on other sites More sharing options...
Zac 2 Posted June 17 Share Posted June 17 While you are right of course, for professional kind of support there's quite a lot of other alternatives which suit better. Donations to the project are essential, so that things can be done. But where I disagree a little, is that this issue causes the fans to stop. I find that to be a serious issue, it can cause hardware damage. I'm sure there is thermal throttling and auto-shutdown if the temperature reaches some thresholds, but it's never good to go into that area. So the "1000 bugs" before this one, I don't agree with it. Now to avoid any kind of unnecessary pressure, my suggestion to all users is: revert back to 21.05.01, until as solution is deployed in the latest release. Link to post Share on other sites More sharing options...
Igor 2301 Posted June 17 Share Posted June 17 24 minutes ago, Zac said: I find that to be a serious issue How about this way - "Sadly we ran out of money to fix things for this year. In reality already second day of the year.". But hey, this is open source. Anyone can fix things. 26 minutes ago, Zac said: I'm sure there is thermal throttling and auto-shutdown if the temperature reaches some thresholds It is. 26 minutes ago, Zac said: Now to avoid any kind of unnecessary pressure, my suggestion to all users is: revert back to 21.05.01, until as solution is deployed in the latest release. That would be some workaround but sadly we have no ability to effectively communicate such message. Link to post Share on other sites More sharing options...
Heisath 94 Posted June 18 Share Posted June 18 Another temporary solution is already provided in the first post btw. Anyone struggling with this issue and only wants the fan to work again can just: On 5/28/2021 at 1:19 AM, halfa said: and after rewriting the line with the new type, udev is happy again # Edit in /etc/udev/rules.d/90-helios64-hwmon-legacy.rules and add the following line after the original one ATTR{type}=="cpu", ENV{HWMON_PATH}="/sys%p/temp", ENV{HELIOS64_SYMLINK}="/dev/thermal-cpu/temp1_input", RUN+="/usr/bin/mkdir /dev/thermal-cpu/" root@helios64:~ # udevadm control --reload root@helios64:~ # udevadm test /sys/devices/virtual/thermal/thermal_zone0 [...] DEVPATH=/devices/virtual/thermal/thermal_zone0 ACTION=add SUBSYSTEM=thermal IS_HELIOS64_HWMON=1 HWMON_PATH=/sys/devices/virtual/thermal/thermal_zone0/temp HELIOS64_SYMLINK=/dev/thermal-cpu/temp1_input USEC_INITIALIZED=7544717 run: '/usr/bin/mkdir /dev/thermal-cpu/' run: '/bin/ln -sf /sys/devices/virtual/thermal/thermal_zone0/temp /dev/thermal-cpu/temp1_input' Unload module index Unloaded link configuration context. Link to post Share on other sites More sharing options...
Igor 2301 Posted June 18 Share Posted June 18 15 hours ago, Zac said: Donations to the project are essential, so that things can be done. All three customers that are complaining here - start with an Angel monthly subscription and it will be quickly enough to cover expenses to fix this bug. Donations are reserved for free willing acts which btw don't cover electricity costs. I wouldn't call that as essential. Link to post Share on other sites More sharing options...
Heisath 94 Posted June 20 Share Posted June 20 Fix for the problem by combining the hwmon rules into one file and adding it regardless of kernel branch: https://github.com/armbian/build/tree/helios64-udev-hwmon-fix Please review the changes and _test_ them. I do not have a helios64 so just done quickly. @halfa @gprovost Link to post Share on other sites More sharing options...
gershwin 0 Posted June 24 Share Posted June 24 I'm having the same issue with a slightly different presentation. Fans don't run, which caused a couple of thermal shutdowns before i realised PRETTY_NAME="Armbian 21.05.4 Buster" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" output from systemctl status fancontrol.service is similar to yours @halfa but not identical the odd thing is that i don't have any /etc/udev/rules.d/90-helios64-hwmon-legacy.rules at all the path /etc/udev/rules.d/ has some rules, but no 90-helios64-hwmon-legacy.rules i've currently got a dead NAS (kernel panic every other boot) without fans, but i can't reinstall, because the latest image is still broken. Are any historic images available, i couldn't see anywhere to get these Link to post Share on other sites More sharing options...
Heisath 94 Posted June 24 Share Posted June 24 Historic images are available here: https://archive.armbian.com/helios64/archive/ or https://mirrors.dotsrc.org/armbian-dl/helios64/archive/ You could also build an image yourself (maybe from this branch https://github.com/armbian/build/tree/helios64-udev-hwmon-fix) and report if it works. Link to post Share on other sites More sharing options...
halfa 5 Posted June 24 Author Share Posted June 24 The 21.05.6 release fixed the regression in the package build process (couldn't find a related commit, maybe during the build targets reworks?), removing the "legacy" udev rule and adding the correct one. root@helios64:~ # apt info armbian-bsp-cli-helios64 Package: armbian-bsp-cli-helios64 Version: 21.05.6 Given that this is fixed, I don't know if there is a need to patch the upstream Heisath. I'm willing to test you're version of the merged udev rule but I don't have a legacy env. to properly test the other one. Link to post Share on other sites More sharing options...
Heisath 94 Posted June 25 Share Posted June 25 Question is, is it really fixed? Or is it just by chance now always including the current udev rule? Also maybe on legacy now using the current file? I'd think so... I will leave my branch open for now. Maybe someone has the time & hardware to fully test. Link to post Share on other sites More sharing options...
gershwin 0 Posted June 27 Share Posted June 27 I've reinstalled with Armbian 21.02.3 Buster with Linux 5.10.21-rockchip64. I didn't see any mention of a fix in release notes, and sadly, WFH, and with a newborn, I don't have time to reinstall a version where it's uncertain this is in it. Link to post Share on other sites More sharing options...
Igor 2301 Posted June 27 Share Posted June 27 7 hours ago, gershwin said: with a newborn, I don't have time to reinstall a version Congratulations You won't believe - I have two small kids, a wife, a cat (ok, this one is on low maintenance), a full time job and a full time project to maintain, I don't have time to test this even we are providing this software for you and I had this hardware. Alternative is to hire you help, but since you are not interested, this is not going to happen. Link to post Share on other sites More sharing options...
Recommended Posts