Jump to content

prahal

Members
  • Posts

    132
  • Joined

  • Last visited

Other groups

Contributor/Maintainer

1 Follower

Profile Information

  • Gender
    Male
  • Location
    France
  • Interests
    NAS, media center, Kernel

Contact Methods

  • Mastodon
    @abws
  • IRC
    #abws
  • Github
    prahal
  • Discord
    prahal

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Which kind of additional communication from serial terminal? The boot parameter I am thinking of might have no way to be applied to a USB SATA adapter. This is libata.force=3.0Gbps or 1.5Gbos. But I only used it for PCIe SATA. You might want ro try a dual power USB 3 cable to have enough current for the USB M.2 data drive. Have you tried ebin-dev patched DTS with your USB m.2 SATA device attached (might require the edge kernel though)? Edit: can you try with your USB adapter plugged to an USB 3 Power hub itself plugged to the helios64?
  2. $ grep I2C_RK3X /boot/config-6.9.3-edge-rockchip64 CONFIG_I2C_RK3X=y I2C is a bus to let a microprocessore communicate with other circuits. It is enabled and cannot be disable on armbuan without rebuilding your kernel. If I remind well it is even required on helios64 for the PMIC (the circuitry that control power, reboot, etc). Likely you get i2c errors because the board went unstable hardware wise.
  3. Did you mean you also experienced a crash "with" the patch when you plugged a SATA to USB3 device? (You wrote "without the patch"). Do you know the current drain of your device? ( Is the device an enclosure with a non bundled drive? Is it a 2.5 inch HDD?)
  4. The USB board is USB 3.0 thus should be 0.9A max and 5V. Your SSD is 1.5*3.3 ie 4.95W, vs USB 3.0 max 4.5W. Though I believe the SSD might not always consumes the maximum. Could be later kernel use the SSD to it's maximum thus makes this SSD consumes too much. It is not given there is a bug in newer kernel. Also it could be this extra consumption only destabilize the board at boot because the board has other components also draining more current at boot. There might be tests that can help sort this out. I think there might be ways to lower the current consumption of the M.2 SSD (maybe by lowering the libata link speed via a kernel boot param). We could also try to find another USB 3 device which also stretch the limit. And check if the behavior under 5.10 and 6.6 is reproducible. Can you give a link to the SSD you put on the USB M.2 board? If it is not too expensive I could try to reproduce the setup. I have a USB multimeter and could check if the USB 3 limit is really overflowed (and if the current drawn is different for 5.1 and 6.6). But I won't be able to tell soon. COM error is likely your serial console program. Might be related to the helios64 board crashing on the other side. Mind that hardware hang just freeze the board, no messages are outputted. In the case I encountered it means too low voltage for load. When you tell you plugged the SSD in the back USB, do you mean you plugged it after boot or before boot when it failed?
  5. My change was about restoring the eth0/end0 MAC address to its intended value (ie grabbed from OTP via SPI as it was designed to work until it was broken at one point). It does not change the interface names. I warned about it as if one had a static DHCP lease on the MAC that was set for a few years other than the OTP one the lease would not apply. The change is included for 6.9 and up, the issue reported above is for 6.8 (though it is likely a user space issue as interface renaming is udev userspace). So I guess @crosser upgraded from a kernel with the initial behavior I restored to an intermediate one which did not grab the MAC from the OTP. Note also that I did not apply these changes to the 6.6 kernel (-current), only to -edge. The MAC address and the interface rename issue are unrelated.
  6. Thanks for the logs. I don't know what is wrong with your boot. You told that your eMMC setup broke (what you call MMC, I guess)? It was probably the eMMC breakage that affected most rk3399 boards and requires a property to be added for eMMC hs400 to boot. Since then hs400 has been disabled. I will reenable it after adding the property to the helios64 Armbian dts. If I understood correctly you also would like to boot from SD (and that you are currently booting from eMMC?). But when you plug a USB external SSD (that you use to store downloads, temp files, and anything not OS related) into the front USB socket boot fails (you said "I use the USB header for the front panel for an internal SSD", I guess you mean an external USB SSD, not an internal SSD). And you have 5 disks in the internal SATA slots as a RAID6. Mind the bootloader will stay on eMMC but you can move the OS to SD. This likely won't solve your boot with a USB external SSD plugged into the front USB socket... Mind I have multiple USB external HDDs plugged into the back USB socket and the boot is working. What is the amperage required by the external USB drive you plug into the front USB socket? (this socket can output max 900mA). ff3d0000.i2c is related to usb-c: [ 6.412784] OF: graph: no port node found in /i2c@ff3d0000/typec-portc@22
  7. It might be of interest in case it is not the same random corruption, then we would be able to fix the kernel. The random corruption (I believe at the CPU stage) most of the time we get a weird unrecognized instruction, but the issue still looks random (even if way more likely when btrfs scrub or zfs check). I really need to talk to hardware guys from armbian to sort out what to take note of (USB devices, power bank, PCI stress, ...). Either way you are way better with a USB c to serial cable to a computer to get the logs. You can even save the output from your serial terminal application with the "script" command, maybe in tmux/screen session.
  8. mind the voltage changes I am uneasy to push to main until I either get no more crashes for a long long time or the issue is nailed down to its cause. I might revisit this choice. I plan to have the emmc hs400 fix in Armbian and vanilla linux. Probably not for the august Armbian release but I expect for the next one.
  9. Adding more clues from your issue: this u-boot is I guess, mainlain u-boot and ATF with rockchip DDR blob. linux-u-boot-edge-helios64_22.02.1 is likely the rockchip ATF, miniloader and DDR blob. Seems linux-u-boot-edge-helios64_22.02.1_arm64 is not available to download anymore? EIther way, this is a binary blob so could only be used to test feature parity. apt policy linux-u-boot-helios64-edge linux-u-boot-helios64-edge: Installé : 24.5.1 Candidat : 24.5.1 Table de version : *** 24.5.1 500 500 http://apt.armbian.com bookworm/main arm64 Packages 100 /var/lib/dpkg/status 24.2.1 500 500 http://apt.armbian.com bookworm/main arm64 Packages the link you provided looks like a match for this issue. I noted that rockchip-i2s ff8a0000.i2s: Could not register PCM is printed each time a USB device is plugged in (probably also when probed at boot). also, it has been a long time I noticed: juin 08 16:18:53 helios64 kernel: platform ff1e0000.spi: deferred probe pending: (reason unknown) juin 08 16:18:53 helios64 kernel: platform ff200000.spi: deferred probe pending: (reason unknown) juin 08 16:18:53 helios64 kernel: platform ff8a0000.i2s: deferred probe pending: (reason unknown) juin 08 16:18:53 helios64 kernel: amba ff6d0000.dma-controller: deferred probe pending: (reason unknown) juin 08 16:18:53 helios64 kernel: amba ff6e0000.dma-controller: deferred probe pending: (reason unknown) juin 08 16:18:53 helios64 kernel: platform ff1d0000.spi: deferred probe pending: (reason unknown) at the end of the kernel boot log. (ie kernel giving up trying to load the drivers for these). Seems related too.
  10. prahal

    prahal

  11. Linux kernel ML discussion about an upstream fix WIP as of the 11th of June 2024: https://lore.kernel.org/lkml/20240326-rk-default-enable-strobe-pulldown-v1-3-f410c71605c0@folker-schwesinger.de/t/
  12. Started work on syncing the helios64 dts to upstream for 6.9: https://github.com/prahal/build/tree/helios64-6.9 . I removed the overclock disabling patch as the overclock as it disables an overclock that is at least nowadays not in the included rk3399-opp.dtsi (ie cluster0 has no opp6 and cluster no opp8). It was not a high priority beforehand but as the helios64 dts starts to change, thus carries unnecessary work. The pachtset applies. The helios64 dts compile fine to dtb. Kernel built and booted. (see below for network connection, ie ethernet MAC fixup) There are not many functional changes on my side (there are in upstream dts). There are a few differences with upstream dts I did not bring as I don't know if they are leftover from the initial patch set or new fixups. But this could already have been an issue before 3.9 for most of these changes (there is at least a new upstream change 93b36e1d3748c352a70c69aa378715e6572e51d1 "arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64") I brought forward. I also brought in vcc3v0_sd node "enable-active-high;" and "gpio = <&gpio0 RK_PA1 GPIO_ACTIVE_HIGH>;". Beware. ethernet MAC change (now working as designed). The fact I kept the aliases from upstream fixes the ability for the eth0 (then renamed to end0 by armbian-hardware-optimization) ethernet mac - grabbed from OTP via SPI in u-boot to be applied. Thus if you bound the MAC address that was generated by the kernel instead of from the hardware to a host and IP, you will have to find the new IP assigned by the DHCP server to connect to the helios64. I believe this is fine for edge even if not for current (so should be good for 6.9?).
  13. @Trillien thanks, that confirms that I am not alone with a setup that does crash even with 5 milliseconds delay 🙂 If time permits could you try with the TRANSITION_DELAY value increased 10 times in the test case code (to 50millisecs, ie 50000) then 100 times to 500000?
  14. @BipBip1981I agree and I did not plan on doing it on my own. But phone repair shops have skilled technicians who can do it. Still, the need to replace a hardware component is a wild guess. At this point, I was merely saying that I was ready to test a hardware change on my board to find out if the problem was a hardware issue. In the end, I believe that if we better understand what is wrong, be it the hardware, we might even be able to work around such a hardware shortcoming in software. I would not suggest messing with the hardware to test if it works better except if you are ready to lose the board. But mine is so unstable (probably due to my raid10 setup inherited from the helios4) that I could barely use it for years. So it is a matter of either testing if I can get it stable or buying a new NAS and sending this helios64 to the trash. I hope to be able to tell you a good governor/frequency but I need to test more. At least the most reliable frequencies without voltage quirks for the big cpu seemed to be the lowest 408000 and the highest 1800000. So you might want to force the "userspace" governor and "1800000" as a frequency.
  15. TLDR; yes upping 75mV helps drastically, but is not enough at least for all frequencies. Indeed, before upping by 75mV I could not boot most of the time (only "emergency" mode boot was reliable, ie no raid10 and services off). But it seems 75mV is not enough to compensate for the issue at stake all the time. The thing is I don't know what the root issue upping 75mV workaround is. Could be 100mV is enough, but this is a value based on testing, not a theory that requires 75mV (could be the proper value is upper or could be upping the voltage only helps to cope with voltage drops, making them less frequently drop below a certain value where cpub crashes). The datasheet for the cpub regulator requires a bigger capacitor on voltage input than the helios64 one. But the weird thing is most rk3399 boards also use the same weak below-spec capacitor value at this place. At my level (without understanding the hardware interactions or barely) the next step would be to test if my test case also crashes these other boxes with the same vin too low capacitor ... if they crash we could guess that the design is bad and without a bigger capacitor the regulator cannot deliver the voltage for cpub reliably. Could be we could workaround this in software, but I am not qualified to tell that, at least at this point (I read about how these components work, but I am not an expert. Mind also I tested the board way less for the time to come as now that it is quite reliable I started using it again (been down for months, then I extracted the motherboard to test with the less complex setup possible, in emergency mode). NB: upping the voltage makes the CPU hotter, you might want to check the temperature values (with "sensors"). Mine were fine, way below the throttling temp of 80°C for the rk3399. Even with all opp3 and above at 1.2V. The issue seems mostly of keeping the power consumption low. But I wonder if it has a noticeable effect on helios64 power consumption.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines