

prahal
Members-
Posts
167 -
Joined
-
Last visited
Content Type
Forums
Store
Crowdfunding
Applications
Events
Raffles
Community Map
Everything posted by prahal
-
@liberodark I had HDD not detected twice (on SATA connector the previous one was working before). Unplugging and then replugging the hard drive multiple times ended up working. I read that someone (I do not have the link at hand but likely on the Armbian forum) had this fixed by cleaning the connectors of the SATA harness (on the HDD side) with alcohol. Maybe alcohol is not required, and simply rubbing the connectors with a cloth is sufficient (it could even be on the HDD connector's side). I did not tried the alcohol on my side as simply replugging a few times did the trick. I put pressure while plugging the HDD in. It may not be necessary to put pressure while plugging but if cleaning the connectors does not help it might be that the SATA data socket does not fit perfectly with the HDD SATA data plug. Hard to tell without further testing. Try with care and if it still does not work tell it here.
-
I tried a more extensive test with all armbian (and the above regulator ones above) patches removed (patches removed for an unrelated test) and the test failed CQHCI. I still have to try the extensive test with devm patch applied and also with the above patchset fixed to apply to armbian kernel 6.1.12. The read test works: dd if=/dev/mmcblk1 of=/dev/null bs=16M iflag=direct 931+1 enregistrements lus 931+1 enregistrements écrits 15634268160 octets (16 GB, 15 GiB) copiés, 65,3501 s, 239 MB/s but with emmc part1 mounted on /mnt, the write test fails: sudo dd if=/dev/sda of=/mnt/test2.dat bs=16M count=500 oflag=direct dd: erreur d'écriture dans '/mnt/test2.dat': Erreur d'entrée/sortie 1+0 enregistrements lus 0+0 enregistrements écrits 0 octet copié, 0,856469 s, 0,0 kB/s with error: [ 2240.820878] mmc1: running CQE recovery [ 2240.824858] ------------[ cut here ]------------ [ 2240.825339] mmc1: cqhci: spurious TCN for tag 12 (...)
-
@n3o I use ssh to check Power-Off_Retract_Count about the Odroid shutdown script I do not know if '--idle-unload' is needed. But the script already works. The only issue is that it does not run in U-Boot space (on U-Boot reset) or even when the kernel crashes. SO I believe this is only a bandaid. A proper fix should be made in the helios64 U-Boot code. About the U-boot. No U-Boot is not updated on a kernel upgrade (U-Boot is like the BIOS). It is not even updated when you upgrade the u-boot package linux-u-boot-helios64-current (or linux-u-boot-helios64-edge if runnign armbian edge). You have to run armbian-config, choose "System", then "Install" then finally "5 Install /Update the bootloader on SD/eMMC" (or short, run nand-sata-install and choose "5 Install /Update the bootloader on SD/eMMC") and even then if you run Armbian on an SD card it will update the U-Boot on the SD card but not on the eMMC. And I believe Helios64 boots on the eMMC U-Boot even if an SD card is installed (you can add a jumper on the board to switch to SD card boot but it is not the default), so the previous U-Boot update command will have no effect on user with an SD card root. To me in such case you have to run in (valid for in a Bourne shell) (you have to replace the placeholders!): source /usr/lib/u-boot/platform_install.sh write_uboot_platform <your uboot directory> <your emmc device file> You can find the emmc device by: dmesg | "mmc.*: SDHCI controller on fe330000.mmc \[fe330000.mmc\] using ADMA" if you see mmc1 then the eMMC device will be /dev/mmcblk1 to get the u-boot directory, first find you u-boot debian package: dpkg -l "linux-u-boot-helios64*" Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder | État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=attend-traitement-déclenchements |/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais) ||/ Nom Version Architecture Description +++-==========================-=============-============-================================= ii linux-u-boot-helios64-edge 23.02.0-trunk arm64 Uboot loader 2022.07 in the result you see linux-u-boot-helios64-edge, then run "dpkg -L <your u-boot debian package name>: dpkg -L linux-u-boot-helios64-edge | grep /usr/lib/linux-u-boot- /usr/lib/linux-u-boot-edge-helios64_23.02.0-trunk_arm64 /usr/lib/linux-u-boot-edge-helios64_23.02.0-trunk_arm64/idbloader.img /usr/lib/linux-u-boot-edge-helios64_23.02.0-trunk_arm64/u-boot.itb then you know your u-boot directory name is /usr/lib/linux-u-boot-edge-helios64_23.02.0-trunk_arm64 which gives for an edge u-boot for armbian 23.02.0 install with an emmc on /dev/mmcblk1: write_uboot_platform /usr/lib/linux-u-boot-current-helios64_23.02.0-trunk_arm64 /dev/mmcblk1 you can replace the write_uboot-platform call with its commands you can read in /usr/lib/u-boot/platform_install.sh but they vary. Last about the U-Boot version I believe you can only get the currently installed version from the serial console log when the box startup. Maybe you could read the storage blocks where U-boot is stored but this is hard to guide through. For the above setup, it would give : sudo strings /dev/mmcblk1 |grep "U-Boot" U-Boot TPL 2022.07-armbian (Feb 15 2023 - 11:06:06) U-Boot SPL 2022.07-armbian (Feb 15 2023 - 11:06:06 +0000) U-Boot FIT image for U-Boot with bl31 (TF-A) U-Boot (64-bit) %U-Boot (...) which will give U-Boot version 2022-07-armbian (the date is new because it is an U-Boot install of a local build Armbian build). Note you should terminate the above command before it completes else it will read the whole storage which can take a while (this with Ctrl+C). About the HDD noise. Do you really get it on poweroff? Here it only happens on power on from reboot. Poweroff does not call into U-Boot. So maybe the issue is with your parking head timer removal being at fault or something specific you did. And nothing in code could help with that be it kernel or U-Boot. That is if you prevent the head from being parked by the kernel before it turns the power off then this might be a case where Retract Count will happen and only a custom script like the Odroid one could help you workaround the issue with your tweaking. I find "HDD noise and Power-Off_Retract_Count" being independent really weird. But I noticed it (but only in a specific process ... maybe it was not exactly the *same noise* in that it indeed make a similar noise but maybe the head had more time to park. As told above I was able to reproduce it though with a very specific process told above but not always (as I added in an edit of the above post I can not always reproduce so maybe you missed it). Also, I believe the Retract Count increasing only happening on power on from reboot is due to the power being kept on by the kernel on the HDD on reboot and the helios64 U-Boot code turning them off before turning first rails A HDD on then rails B HDD on. Maybe Retract Count only happens because the head has no time to park before being turned on. Though that is unlikely as both rails A HDDs and rails B HDDs get their retract count increase while rail B has a bigger delay before being started up (but maybe this delay is still too short, this requires work to find out). Also, the head should be parked on poweron from reboot per kernel somehow does not trigger this parking on reboot and the U-Boot helios64 code turns their power off on boot (thus the HW triggered head parking on no power). I believe parking the head on the off case is not safe (the kernel can crash and then the head will not be parked whatever the script of kernel code). This has to be done before turning their power off in the U-Boot (per the helios64 has to turn only one rail on before the other it has to turn at least one off at boot. Maybe it could turn off only one instead of turning both off then turn one on and then the other, but that will still increase the Retract Count for the HDD on the rail it turned off which is not great). About a recording of the noise I cannot do it right now but I will try when time permits. But I believe the noise only depends on the hardware involved but there will always be noise. It turns out I have a mix of WD30EFRX and WD60EFZX so the noise should be pretty close to yours. I will still provide mine if and when time permits. Could you also share a recording of the noise you get? Note that USB HDD does not need handling with helios64. They are already turned off when needed. The only issue is with PCIe SATA HDD on helios64. Because U-Boot turns them off on startup. If they are already off (because one poweroff the helios64) that is not an issue as they are already off and parked. Only when reboot they are not turned off and then the HDD park with their firmware because their power has been turned off by U-Boot to keep enough power to turn the HDD on (which make me believe a simple fix would be not to turn off then on the rails in case of reboot because ... well the HDD are already on, no need to take into account that we do not have enough power to turn them all on at once by powering rails 1 then rails B). I start to believe you made a tweak with wdidle that is wrong for the hardware (or at least not supported by Linux kernel) because you get the noise on poweroff. So someway you are preventing the kernel to park the head on poweroff thus it is logical that you get an HW head parking triggered by no power to the HDDs. Having this wdidle setting supported is not a downstream issue (Armbian), this is an upstream feature request. Armbian could develop it if they can. This could even be rejected by upstream as an unsupported configuration. Hard to tell. But it would definitely not be a bug. You cannot mix any combination of hardware settings. These even have to be supported by the other hardware on the HW platform. I believe if you make the same tweak you did with wdidle to USB enclose the same noise and parking head issue will arise. But this is due to the setting, not a bug in the kernel or U-boot I guess. Can you reproduce the HDD noise on poweroff without any wdidle tweak?
-
It turns out I only had one of the above patch applied: Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0001-regulator-devres-Add-devm_regulator_bulk_get_exclusi.patch info Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0002-regulator-core-fix-unbalanced-of-node-refcount-in-re.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0003-regulator-core-use-kfree_const-to-free-space-conditi.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0004-regulator-core-fix-use_count-leakage-when-handling-b.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0005-regulator-core-fix-module-refcount-leak-in-set_suppl.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0006-regulator-core-fix-resource-leak-in-regulator_regist.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0007-regulator-core-Use-different-devices-for-resource-al.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0008-regulator-core-Fix-resolve-supply-lookup-issue.patch failed wrn Displaying message: * [\e[33mu\e[0m][\e[32mc\e[0m] p-0009-regulator-core-fix-deadlock-on-regulator-enable.patch failed wrn and one that is unrelated. So I would say that armbian edge should do for hs400es on rk3399. Though if I remove: assigned-clock-rates = <150000000>; from the dts (which falls back to rk3399.dtsi value which is 200000000) I get the old error:
-
Made a test case that reproduces the issue: $ cat test-pkg_resources.py import pkg_resources print(pkg_resources.parse_version("2.3")) $ for i in $(seq 1 100);do python3 test-pkg_resources.py ;done 2.3 2.3 2.3 double free or corruption (out) Abandon (core dumped) 2.3 The kernel error I had where from failure to allocate (but as those errors stack trace shows it is a failure to allocate memory of a high order (big chunks) for the core dump generation).
-
@ebin-dev 6.2 should have it fixed. Though it requires testing. I did build and install the kernel from Armbian edge 6.1.11-rockchip64 and added the few missing packages for the core regulator for the stable 6.2 repository https://github.com/torvalds/linux/commits/master/drivers/regulator/core.c , that is: cb3543cff90a regulator: core: fix deadlock on regulator enable 0debed5b117d regulator: core: Fix resolve supply lookup issue 8f3cbcd6b440 regulator: core: Use different devices for resource allocation and DT lookup ba62319a42c5 regulator: core: fix resource leak in regulator_register() da46ee19cbd8 regulator: core: fix module refcount leak in set_supply() 0591b14ce039 regulator: core: fix use_count leakage when handling boot-on dc8d006d15b6 regulator: core: use kfree_const() to free space conditionally f2b41b748c19 regulator: core: fix unbalanced of node refcount in regulator_dev_lookup() fd1845069711 regulator: devres: Add devm_regulator_bulk_get_exclusive() All are probably not required but I have not yet sorted out which ones matter (and it will become moot as soon as 6.2 enter armbian (except if a new commit breaks the hs400es on rk339 anew. (Likely the fix would be 0debed5b117d regulator: core: Fix resolve supply lookup issue) The test was a simple mount of the emmc this kernel and hs400es reenabled in the dts, write a file, unmount, remount, read the file. No error in dmesg.
-
@quokka note that thermal-board is different from thermal-cpu. Both should exist. And they report different values. (well I do not get thermal-board if I do not explicitly load the lm75 module but that is another matter, maybe due to me running the edge kernel). As the board is usually 15 degrees below the CPU it is not safe for the CPU to bind the fan to the board temp (they likely will never trigger). OMV has a tendency to remove packages on upgrade (as it forces the upgrades whatever the criticality of the packages the conflict resolution wants to remove). You want to check you still have armbian-bsp-cli-helios64 installed. If not install and reboot.
-
OK, I revert part of my statement. U-Boot reset command indeed increase Power-Off_Retract_Count S.M.A.R.T. attribute, be it happening after a Linux kernel power down reboot: Power down or after a Linux kernel plain reboot reboot: Restarting system only not in all cases. With the odroid shutdown script U-boot reset command increases the Power-Off_Retract_Count in all cases (but never for the initial U-Boot boot which never makes noise). After power off, an initial U-boot boot, and 3 U-boot boots reset I get an increase of 3 for the Retract Count. After a reboot which makes the initial U-Boot boot and 3 U-Boot resets that also makes a 4 increase of Retract Count. But without the odroid shutdown script, if I power off the board and then power on it, on the U-boot prompt the first reset makes the noise but does not increase the Power-Off_Retract_Count. Then if I do more reset in a row in U-Boot the Power-Off_Retract_Count increases those extra number of times. The initial U-Boot boot after power off makes no noise and does not increase this Retract Count number, so for the initial U-Boot boot and 3 resets, I get an increase of 2. Also without the odroid shutdown script, if I reboot and enter reset 3 times in U-Boot I get an increase of 4 of the Retract Count (initial U-Boot makes HDD noise and increases the Retract Count, then the 3 U-Boot resets also make the HDD noise and increase the Retract Count by 1 each). So the only special case is a startup after powering off without the odroid script. That the initial U-Boot boot makes no noise and does not increase the Retract Count is logical. But that the first U-Boot reset after a U-Boot boot from power off makes HDD noise but does not increase the HDD Retract Count that I did not expect (and that was the case I initially tested with and with only one reset, thus my previous analysis). Note that on further attempts it behaved as usual: poweroff without odroid script, 1 initial U-Boot + 5 U-Boot resets made an increase of 5 of the Retract Count (maybe Retract Count is not increased in 100% of the U-Boot resets though it is at least the common case).
-
Note that monitoring the serial console on reboot I can hear the HDD sound not on power off but on power on from reboot (ie if I enter `poweroff` on the helios64 and then press the power on button, no HDD sound ensues). Only when entering the `reboot` command or a crash that make the kernel restart the box do I hear the sound ... but only when the box starts, not when it stops. I believe this is due to the U-Boot power rails management code. Probably that it turns off all the rails before turning on one of the rails, then the other rail. This is as it turns off all rails. Then the Power-Off_Retract_Count increase. Note that running the `reset` command on the U-Boot prompt does produce the click (but does not increase the Power-Off_Retract_Count S.M.A.R.T. attribute). And that only the Power-Off_Retract_Count for the PCIe plugged HDDs increases (contrary to the Odroid issue the USB are not affected ... because the HDD power rails are only for the PCIe ones). So poweroff does not produce the HDD noise, nor does it increases the Power-Off_Retract_Count . Reboot produces the HDD noise and increases the Power-Off_Retract_Count but on startup from reboot. U-Boot reset produces the HDD noise but does not increase the Power-Off_Retract_Count (which I do not understand). The odroid shutdown scripts fixes the HDD noise on reboot and Power-Off_Retract_Count increase (but does not help when the kernel auto reboot due to a crash ... because then the systemd shutdown scripts are not called). The odroid script helps but I believe the issue should be handled in the helios64-specific U-Boot code. But I do not know this code, nor do I know how power rails should be handled to avoid the noise on startup from reboot and U-Boot reset command, nor do I know how to avoid the Power-Off_Retract_Count increasing (which seems related to the HDD noise but not in all cases). Maybe fixing one will fix both issues. ``` U-Boot TPL 2022.07-armbian (Nov 22 2022 - 02:23:55) Channel 0: LPDDR4, 50MHz BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB Channel 1: LPDDR4, 50MHz BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB 256B stride lpddr4_set_rate: change freq to 400000000 mhz 0, 1 lpddr4_set_rate: change freq to 800000000 mhz 1, 0 Trying to boot from BOOTROM Returning to boot ROM... U-Boot SPL 2022.07-armbian (Nov 22 2022 - 02:23:55 +0000) Trying to boot from SPI Trying to boot from MMC1 Card did not respond to voltage select! : -110 spl: mmc init failed with error: -95 Trying to boot from MMC2 NOTICE: BL31: v2.6(release):a1f02f4f-dirty NOTICE: BL31: Built : 02:23:49, Nov 22 2022 <HDD sound> U-Boot 2022.07-armbian (Nov 22 2022 - 02:23:55 +0000) SoC: Rockchip rk3399 Reset cause: RST DRAM: 3.9 GiB (...) ```
-
@n3o thanks for the script. I will try it. The script is generic (it will work on any box), nothing Odroid specific. I followed https://wiki.odroid.com/odroid-xu4/troubleshooting/shutdown_script instructions to install. I do not know if it is necessary or if as told above as the head automatically parks with no need for the script. Still, the noise is too high and if I can get rid of it, I will.
-
Good news. Without the revert and on Armbian v6.11 with a few v6.12 patches for drivers/regulator/core.c (that are in stable/master) I can mount my EMMC partition on my rk3399 with hs400es enabled without getting the cqhci error (not tested extensively but at least an improvement. I applied above Armbian edge v6.11: cb3543cff90a regulator: core: fix deadlock on regulator enable 0debed5b117d regulator: core: Fix resolve supply lookup issue 8f3cbcd6b440 regulator: core: Use different devices for resource allocation and DT lookup ba62319a42c5 regulator: core: fix resource leak in regulator_register() da46ee19cbd8 regulator: core: fix module refcount leak in set_supply() 0591b14ce039 regulator: core: fix use_count leakage when handling boot-on dc8d006d15b6 regulator: core: use kfree_const() to free space conditionally f2b41b748c19 regulator: core: fix unbalanced of node refcount in regulator_dev_lookup() fd1845069711 regulator: devres: Add devm_regulator_bulk_get_exclusive() Likely the fix would be 0debed5b117d regulator: core: Fix resolve supply lookup issue though that needs testing.
-
This code should be fine for all boards, this is the way the code was doing before v5.10.44. Though the developers patch upon the code that made this regression (code which looks fine on the paper but in practice breaks hs400es on rk3399 at least). I though that 8a866d527ac0 ("regulator: core: Resolve supply name earlier to prevent double-init") would also fix rk3399 emmc CQHCI but it did not.
-
/dev/thermal-cpu is there for me (OMV6 , Debian bullseye: upgraded from OMV5 Debian buster). The symlink is created by /etc/udev/rules.d/90-helios64-hwmon.rules from armbian-bsp-cli-helios64 (I have version 22.11.0-trunk of it but I checked and 22.02.1 also has it). Do you have this package installed ?
-
@mortomanos Could you post the dt overlay support gave you for the helios64 back in 2021?
-
PR https://github.com/armbian/build/pull/4480
-
Thank you @prahalfor digging into this !! @aprayoga@piter75 Could the emmc issue be solved now with this input ? @ebin-dev @piter75 I believe the upstream fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=8a866d527ac0441c0eb14a991fa11358b476b11d will do the job (in 6.1-rc1 so to be expected in Armbian edge if you want to try EMMC anew when it lands). This is likely the same issue as before the bad commit I pointed at the code only called set_machine_constraints once. Still, requires testing (I saw that a lot of rk339 boards removed hs400 in Armbian, maybe that will fix them all). regulator: core: Resolve supply name earlier to prevent double-init Previously, an unresolved regulator supply reference upon calling regulator_register on an always-on or boot-on regulator caused set_machine_constraints to be called twice. This in turn may initialize the regulator twice, leading to voltage glitches that are timing-dependent. A simple, unrelated configuration change may be enough to hide this problem, only to be surfaced by chance. One such example is the SD-Card voltage regulator in a NanoPI R4S that would not initialize reliably unless the registration flow was just complex enough to allow the regulator to properly reset between calls. Fix this by re-arranging regulator_register, trying resolve the regulator's supply early enough that set_machine_constraints does not need to be called twice. Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com> Link: https://lore.kernel.org/r/20220818124646.6005-1-christian@kohlschutter.com Signed-off-by: Mark Brown <broonie@kernel.org> Note that this fix reintroduce the less critical bug that was fixed by the bad commit I pinpointed namely sysfs entries issues; This was also fixed in 6.1-rc1 by commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=520fb178212d1dd545ed0ed231df09111b30ab7e "regulator: core: Fix regulator supply registration with sysfs"
-
adding kernel parameter cpufreq.off=1 I managed to run it for 61 days (ending up rebooting to test other things) You can add it to /boot/armbianEnv.txt extraargs. My next step is to try with vdd_cpu_b ramp-delay set to 1000 instead of 40000 in the dts.
-
This was replied on the reddit thread for this issue: ie "The part is 10nf capacitor SMD 0402 (CAP-0402-X5R-10nF-25V-10%)".
-
@yaleman if your helios64 is unstable you might want to try in /boot/armbianEnv.txt adding cpufreq.off=1 to extraargs: extraargs=earlyprintk cpufreq.off=1 this option is available back to kernel 4.11. Then reboot. To test without reboot: echo 1 | sudo tee /sys/module/cpufreq/parameters/off might do. I am still investigating and will try less severe ways soon (also there were reports of an instability issue with vdd_log so it might be required to set it to 0.95V in u-boot (like https://lists.denx.de/pipermail/u-boot/2019-January/353357.html ) and/or remove its definition in the board dts https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi?id=87eba0716011e528f7841026f2cc65683219d0ad in 2017 but it was reinserted in 2022 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi?id=e6bbf0d53ae1060ee6403bafcc4d1fd25d088e40). This can be tested with: regulator dev vdd_log regulator value 950000 in /boot/boot.cmd then mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr and reboot. But with this cpufreq disabled, I am at 27 days stable with the same loads that crashed in less than 16 days beforehand (it was crashing even with cpufreq set to performance and a fixed freq . I am waiting for to see if I can be stable beyond my max of 42 days. What seems to stress the board is borgbackup (which runs 23h per day since I have a repo with too many files in it and the board thus swap has it only has 4G RAM), urbackup, mdadm raid10 (when it resync which happens after a HW crash at boot - before the boot script to set cpufreq is even run - so setting to performance with a static freq has no effect on this crasher). OMV seems to make the bug trigger more often, I think because if it sysctl tweaks from /etc/sysctl.d/99-openmediavault-nonrot.conf which I for now symlink to /dev/null. I hope the issue is that helios64 dts has vdd_cpu_b: regulator@40 regulator-ramp-delay set to 40000 instead of the more common 1000 amond the rk3399 boards. I found a thread https://forum.odroid.com/viewtopic.php?t=30303 where hardkernel went from 1000 to 40000 on the odroid n1 but it might be that it is not stable on our rk3399 chips. There is also the pwm-supply instead of vin-supply in dts for vdd_log per https://github.com/torvalds/linux/commit/dc570e8e1a7036eaaeede71b55e14739710ea0a4.