Jump to content

prahal

Members
  • Posts

    162
  • Joined

  • Last visited

Everything posted by prahal

  1. @ebin-dev 6.2 should have it fixed. Though it requires testing. I did build and install the kernel from Armbian edge 6.1.11-rockchip64 and added the few missing packages for the core regulator for the stable 6.2 repository https://github.com/torvalds/linux/commits/master/drivers/regulator/core.c , that is: cb3543cff90a regulator: core: fix deadlock on regulator enable 0debed5b117d regulator: core: Fix resolve supply lookup issue 8f3cbcd6b440 regulator: core: Use different devices for resource allocation and DT lookup ba62319a42c5 regulator: core: fix resource leak in regulator_register() da46ee19cbd8 regulator: core: fix module refcount leak in set_supply() 0591b14ce039 regulator: core: fix use_count leakage when handling boot-on dc8d006d15b6 regulator: core: use kfree_const() to free space conditionally f2b41b748c19 regulator: core: fix unbalanced of node refcount in regulator_dev_lookup() fd1845069711 regulator: devres: Add devm_regulator_bulk_get_exclusive() All are probably not required but I have not yet sorted out which ones matter (and it will become moot as soon as 6.2 enter armbian (except if a new commit breaks the hs400es on rk339 anew. (Likely the fix would be 0debed5b117d regulator: core: Fix resolve supply lookup issue) The test was a simple mount of the emmc this kernel and hs400es reenabled in the dts, write a file, unmount, remount, read the file. No error in dmesg.
  2. @quokka note that thermal-board is different from thermal-cpu. Both should exist. And they report different values. (well I do not get thermal-board if I do not explicitly load the lm75 module but that is another matter, maybe due to me running the edge kernel). As the board is usually 15 degrees below the CPU it is not safe for the CPU to bind the fan to the board temp (they likely will never trigger). OMV has a tendency to remove packages on upgrade (as it forces the upgrades whatever the criticality of the packages the conflict resolution wants to remove). You want to check you still have armbian-bsp-cli-helios64 installed. If not install and reboot.
  3. OK, I revert part of my statement. U-Boot reset command indeed increase Power-Off_Retract_Count S.M.A.R.T. attribute, be it happening after a Linux kernel power down reboot: Power down or after a Linux kernel plain reboot reboot: Restarting system only not in all cases. With the odroid shutdown script U-boot reset command increases the Power-Off_Retract_Count in all cases (but never for the initial U-Boot boot which never makes noise). After power off, an initial U-boot boot, and 3 U-boot boots reset I get an increase of 3 for the Retract Count. After a reboot which makes the initial U-Boot boot and 3 U-Boot resets that also makes a 4 increase of Retract Count. But without the odroid shutdown script, if I power off the board and then power on it, on the U-boot prompt the first reset makes the noise but does not increase the Power-Off_Retract_Count. Then if I do more reset in a row in U-Boot the Power-Off_Retract_Count increases those extra number of times. The initial U-Boot boot after power off makes no noise and does not increase this Retract Count number, so for the initial U-Boot boot and 3 resets, I get an increase of 2. Also without the odroid shutdown script, if I reboot and enter reset 3 times in U-Boot I get an increase of 4 of the Retract Count (initial U-Boot makes HDD noise and increases the Retract Count, then the 3 U-Boot resets also make the HDD noise and increase the Retract Count by 1 each). So the only special case is a startup after powering off without the odroid script. That the initial U-Boot boot makes no noise and does not increase the Retract Count is logical. But that the first U-Boot reset after a U-Boot boot from power off makes HDD noise but does not increase the HDD Retract Count that I did not expect (and that was the case I initially tested with and with only one reset, thus my previous analysis). Note that on further attempts it behaved as usual: poweroff without odroid script, 1 initial U-Boot + 5 U-Boot resets made an increase of 5 of the Retract Count (maybe Retract Count is not increased in 100% of the U-Boot resets though it is at least the common case).
  4. Note that monitoring the serial console on reboot I can hear the HDD sound not on power off but on power on from reboot (ie if I enter `poweroff` on the helios64 and then press the power on button, no HDD sound ensues). Only when entering the `reboot` command or a crash that make the kernel restart the box do I hear the sound ... but only when the box starts, not when it stops. I believe this is due to the U-Boot power rails management code. Probably that it turns off all the rails before turning on one of the rails, then the other rail. This is as it turns off all rails. Then the Power-Off_Retract_Count increase. Note that running the `reset` command on the U-Boot prompt does produce the click (but does not increase the Power-Off_Retract_Count S.M.A.R.T. attribute). And that only the Power-Off_Retract_Count for the PCIe plugged HDDs increases (contrary to the Odroid issue the USB are not affected ... because the HDD power rails are only for the PCIe ones). So poweroff does not produce the HDD noise, nor does it increases the Power-Off_Retract_Count . Reboot produces the HDD noise and increases the Power-Off_Retract_Count but on startup from reboot. U-Boot reset produces the HDD noise but does not increase the Power-Off_Retract_Count (which I do not understand). The odroid shutdown scripts fixes the HDD noise on reboot and Power-Off_Retract_Count increase (but does not help when the kernel auto reboot due to a crash ... because then the systemd shutdown scripts are not called). The odroid script helps but I believe the issue should be handled in the helios64-specific U-Boot code. But I do not know this code, nor do I know how power rails should be handled to avoid the noise on startup from reboot and U-Boot reset command, nor do I know how to avoid the Power-Off_Retract_Count increasing (which seems related to the HDD noise but not in all cases). Maybe fixing one will fix both issues. ``` U-Boot TPL 2022.07-armbian (Nov 22 2022 - 02:23:55) Channel 0: LPDDR4, 50MHz BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB Channel 1: LPDDR4, 50MHz BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB 256B stride lpddr4_set_rate: change freq to 400000000 mhz 0, 1 lpddr4_set_rate: change freq to 800000000 mhz 1, 0 Trying to boot from BOOTROM Returning to boot ROM... U-Boot SPL 2022.07-armbian (Nov 22 2022 - 02:23:55 +0000) Trying to boot from SPI Trying to boot from MMC1 Card did not respond to voltage select! : -110 spl: mmc init failed with error: -95 Trying to boot from MMC2 NOTICE: BL31: v2.6(release):a1f02f4f-dirty NOTICE: BL31: Built : 02:23:49, Nov 22 2022 <HDD sound> U-Boot 2022.07-armbian (Nov 22 2022 - 02:23:55 +0000) SoC: Rockchip rk3399 Reset cause: RST DRAM: 3.9 GiB (...) ```
  5. @n3o thanks for the script. I will try it. The script is generic (it will work on any box), nothing Odroid specific. I followed https://wiki.odroid.com/odroid-xu4/troubleshooting/shutdown_script instructions to install. I do not know if it is necessary or if as told above as the head automatically parks with no need for the script. Still, the noise is too high and if I can get rid of it, I will.
  6. Good news. Without the revert and on Armbian v6.11 with a few v6.12 patches for drivers/regulator/core.c (that are in stable/master) I can mount my EMMC partition on my rk3399 with hs400es enabled without getting the cqhci error (not tested extensively but at least an improvement. I applied above Armbian edge v6.11: cb3543cff90a regulator: core: fix deadlock on regulator enable 0debed5b117d regulator: core: Fix resolve supply lookup issue 8f3cbcd6b440 regulator: core: Use different devices for resource allocation and DT lookup ba62319a42c5 regulator: core: fix resource leak in regulator_register() da46ee19cbd8 regulator: core: fix module refcount leak in set_supply() 0591b14ce039 regulator: core: fix use_count leakage when handling boot-on dc8d006d15b6 regulator: core: use kfree_const() to free space conditionally f2b41b748c19 regulator: core: fix unbalanced of node refcount in regulator_dev_lookup() fd1845069711 regulator: devres: Add devm_regulator_bulk_get_exclusive() Likely the fix would be 0debed5b117d regulator: core: Fix resolve supply lookup issue though that needs testing.
  7. This code should be fine for all boards, this is the way the code was doing before v5.10.44. Though the developers patch upon the code that made this regression (code which looks fine on the paper but in practice breaks hs400es on rk3399 at least). I though that 8a866d527ac0 ("regulator: core: Resolve supply name earlier to prevent double-init") would also fix rk3399 emmc CQHCI but it did not.
  8. @halfa is your missing fans detection on helios64 fixed? Do you have `pwm_fan\ in the `lsmod` output?
  9. Can you send the version of `dpkg -l 'linux-image-*-rockchip64'`, '`dpkg -l armbian-bsp-cli-helios64`. Could you also paste the output of `cat /etc/apt/sources.list.d/armbian.list`.
  10. /dev/thermal-cpu is there for me (OMV6 , Debian bullseye: upgraded from OMV5 Debian buster). The symlink is created by /etc/udev/rules.d/90-helios64-hwmon.rules from armbian-bsp-cli-helios64 (I have version 22.11.0-trunk of it but I checked and 22.02.1 also has it). Do you have this package installed ?
  11. Do you also get kernel error messages in the kernel log (journalctl -b -k, or dmesg output)?
  12. I bet next step is to rebuild our kernel with KASAN turned on in the kernel config.
  13. @mortomanos Could you post the dt overlay support gave you for the helios64 back in 2021?
  14. PR https://github.com/armbian/build/pull/4480
  15. Thank you @prahalfor digging into this !! @aprayoga@piter75 Could the emmc issue be solved now with this input ? @ebin-dev @piter75 I believe the upstream fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=8a866d527ac0441c0eb14a991fa11358b476b11d will do the job (in 6.1-rc1 so to be expected in Armbian edge if you want to try EMMC anew when it lands). This is likely the same issue as before the bad commit I pointed at the code only called set_machine_constraints once. Still, requires testing (I saw that a lot of rk339 boards removed hs400 in Armbian, maybe that will fix them all). regulator: core: Resolve supply name earlier to prevent double-init Previously, an unresolved regulator supply reference upon calling regulator_register on an always-on or boot-on regulator caused set_machine_constraints to be called twice. This in turn may initialize the regulator twice, leading to voltage glitches that are timing-dependent. A simple, unrelated configuration change may be enough to hide this problem, only to be surfaced by chance. One such example is the SD-Card voltage regulator in a NanoPI R4S that would not initialize reliably unless the registration flow was just complex enough to allow the regulator to properly reset between calls. Fix this by re-arranging regulator_register, trying resolve the regulator's supply early enough that set_machine_constraints does not need to be called twice. Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com> Link: https://lore.kernel.org/r/20220818124646.6005-1-christian@kohlschutter.com Signed-off-by: Mark Brown <broonie@kernel.org> Note that this fix reintroduce the less critical bug that was fixed by the bad commit I pinpointed namely sysfs entries issues; This was also fixed in 6.1-rc1 by commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=520fb178212d1dd545ed0ed231df09111b30ab7e "regulator: core: Fix regulator supply registration with sysfs"
  16. adding kernel parameter cpufreq.off=1 I managed to run it for 61 days (ending up rebooting to test other things) You can add it to /boot/armbianEnv.txt extraargs. My next step is to try with vdd_cpu_b ramp-delay set to 1000 instead of 40000 in the dts.
  17. This was replied on the reddit thread for this issue: ie "The part is 10nf capacitor SMD 0402 (CAP-0402-X5R-10nF-25V-10%)".
  18. @yaleman if your helios64 is unstable you might want to try in /boot/armbianEnv.txt adding cpufreq.off=1 to extraargs: extraargs=earlyprintk cpufreq.off=1 this option is available back to kernel 4.11. Then reboot. To test without reboot: echo 1 | sudo tee /sys/module/cpufreq/parameters/off might do. I am still investigating and will try less severe ways soon (also there were reports of an instability issue with vdd_log so it might be required to set it to 0.95V in u-boot (like https://lists.denx.de/pipermail/u-boot/2019-January/353357.html ) and/or remove its definition in the board dts https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi?id=87eba0716011e528f7841026f2cc65683219d0ad in 2017 but it was reinserted in 2022 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi?id=e6bbf0d53ae1060ee6403bafcc4d1fd25d088e40). This can be tested with: regulator dev vdd_log regulator value 950000 in /boot/boot.cmd then mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr and reboot. But with this cpufreq disabled, I am at 27 days stable with the same loads that crashed in less than 16 days beforehand (it was crashing even with cpufreq set to performance and a fixed freq . I am waiting for to see if I can be stable beyond my max of 42 days. What seems to stress the board is borgbackup (which runs 23h per day since I have a repo with too many files in it and the board thus swap has it only has 4G RAM), urbackup, mdadm raid10 (when it resync which happens after a HW crash at boot - before the boot script to set cpufreq is even run - so setting to performance with a static freq has no effect on this crasher). OMV seems to make the bug trigger more often, I think because if it sysctl tweaks from /etc/sysctl.d/99-openmediavault-nonrot.conf which I for now symlink to /dev/null. I hope the issue is that helios64 dts has vdd_cpu_b: regulator@40 regulator-ramp-delay set to 40000 instead of the more common 1000 amond the rk3399 boards. I found a thread https://forum.odroid.com/viewtopic.php?t=30303 where hardkernel went from 1000 to 40000 on the odroid n1 but it might be that it is not stable on our rk3399 chips. There is also the pwm-supply instead of vin-supply in dts for vdd_log per https://github.com/torvalds/linux/commit/dc570e8e1a7036eaaeede71b55e14739710ea0a4.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines