Jump to content

prahal

Members
  • Posts

    162
  • Joined

  • Last visited

Everything posted by prahal

  1. Thanks, @Alex T, for your hindsights. Would plugging an ATX power supply instead of the original external power supply help sort this out? What kind of ATX PSU would be required? I had measured the voltage on the board with a voltmeter and got a little above 12V fine, but I believe to phase out transient voltage drops an oscilloscope is required (I got a portable oscilloscope a few months ago). Also, I have a test case (attached cpufreq-switching-2.c) that always crashes the big CPUs (even when I set a 5 milliseconds delay between transitions, with #define TRANSITION_DELAY 5000). I believe if the power supply is at fault, I should be able to see small voltage drops on the 12V rail on the board with the oscilloscope? cpufreq-switching-2.c
  2. I landed the restoration of the eMMC hs400 support and the fix for the helios64-heartbeat-led.service that I had broken in a previous commit. The upped voltages are not in. Though these upped voltage seems to help a lot of people, they are not the endgame. Ie I tried setting all the CPU b voltages to max 1.2, and I believe I can still crash the CPUs b with my test case... Also, I am onboarding on another board maintenance. Fixed an issue with a PL2303 USB serial adapter that it did not support 1.5Mbps but only 1.2Mbps.... In the meantime, I nailed down an issue on edge 6.14 that might also affect other version but not tested yet. I fixed it locally by setting the startup time for HDD rail a and rail b to have them not start at the same time by the kernel. Ie the upstream Linux kernel have the HDD rails defined, and it seems it restarts them when the kernel load thus draining too many amps on my board and I repeatedly had: Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP (...) [ 3.890774] Call trace: [ 3.891012] rockchip_pcie_rd_conf+0xf4/0x1e8 (P) [ 3.891461] pci_bus_read_config_dword+0x7c/0xdc [ 3.891906] pci_bus_generic_read_dev_vendor_id+0x34/0x1b4 [ 3.892428] pci_scan_single_device+0xa0/0x108 [ 3.892857] pci_scan_slot+0x68/0x1c4 [ 3.893216] pci_scan_child_bus_extend+0x44/0x2cc [ 3.893668] pci_scan_bridge_extend+0x320/0x60c [ 3.894104] pci_scan_child_bus_extend+0x1b8/0x2cc [ 3.894564] pci_scan_root_bus_bridge+0x64/0xe4 [ 3.895000] pci_host_probe+0x30/0x108 [ 3.895367] rockchip_pcie_probe+0x43c/0x5e4 [ 3.895777] platform_probe+0x68/0xdc I cannot reproduce this PCIe failure after adding the 10 seconds delay that Kobol devs put in the u-boot between rail a and rail b startup. Though, 10 seconds seems a lot, but I have no clue why the Kobol team chose such a huge value.
  3. I "believe" that a board can only ship one DTB armbian wise. I will have to study that to be confident though. If the board is not the same hardware wise, it could be it requires its own "board" config in armbian. It being close won't help (except it will make it easier for this board maintainer as most of the work will already have been done). I have no experience in adding a new board to armbian, but I believe cloning the armbian/build repository and creating a new config/boards/ is the first step. And having a new tag for this board on the forum. As a local hack (I believe this is not a proper fix but I might be wrong), you might be able to create an override dtb file to remove existing nodes and add new ones to an existing board dtb.
  4. So the vendor 6.1 kernel set both the pcie3x2 ASMedia SATA and pcie3x4 M.2 M.Key vpcie3v3-supply with the same `vcc3v3_pcie30` supply which is the supply for the pcie ref clock so they worked around the issue by describing incorrectly the supply for these pcie controllers. This pcie ref clock power supply patch set is likely still of interest for the mainline 6.12 branch. I confirm that I was at least once able to reproduce the ASMedia SATA not seen with the gated ref clock patch set applied to the vendor branch (even though as I found later on there was already a workaround in the vendor kernel in that they set the oscillator supply as the supply for both pcie controllers even though that does not describe the hardware correctly).
  5. Still building 6.1 vendor with the gated clock patch set. Though I noticed the mainline kernel has default pinctrl settings for pcie but not the vendor one. If one want to try testing these dts pinctrl pcie definition from mainline to vendor.
  6. I will try this 6.13 patchset but it looks like SATA drives are already stable with 6.12 (iè without this patchset) or at least more stable than vendor 6.1. Could be another fix between 6.1 and 6.12 helps. Envoyé de mon CPH2089 en utilisant Tapatalk
  7. If LED issue was helios64:green:status in sysfs instead of helios64::status it is fixed in git.
  8. The 2.5Gbps hardware issue is when operating in 1Gbps mode only (2.5Gbps was always fine hardware wise) https://blog.kobol.io/2020/11/13/helios64-2-5g-ethernet-issue/
  9. These are not error messages but information messages. Ie "waking up from sleep" messages when the device wakeup. Edit: I will have to investigate. "exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x6" and hard resetting the link on wake-up from HDD sleep might not be normal after all.
  10. About the helios64-hearttbeat-led.service I found the cause. When I synced the armbian dts to the upstream one the fact there was a "green" in thre label of the upstream dts slipped through. I will revert this change back to helios64::status instead of helios64:green:status. (the issue was introduced in armbian 6.9 kernels).
  11. @crosser thanks for the feedback. By new edge release stable you mean vanilla armbian one? That is without copying ebin-dev dtb with the new edge kernel?
  12. from reading the dts there is a helios64:green:status which does not show in 6.12.1 from armbian edge but does in 6.12.9 armbian current. On the opposite side helios64::status is listed in the 6.12.1 but not the 6.12.9. I guess the "green" was discarded from the sysfs file name before but no more. If the fact "green" is not removed anymore is a feature the systemd service helios64-heartbeat-led.service will have to be modified to cope with both file names. It might be a good idea to rename this gpio line as helios64:blue:status (as the led is blue not green). I wonder why it was labelled as green, maybe as a hackish way to have the color not included in the sysfs gpio led file name, as it seems green was not preserved in the sysfs name before around Linux 6.12.9. Or maybe labelling a system led OK status as green is a convention. That would require research as I know few about gpio leds conventions, and that would be better done now, before hard-coding "green" in the helios64-heartbeat-led.service systemd service.
  13. Can you reproduce the helios64 crash without the dtb patch? Could you provide output from /sys/class/leds/ with and without the dtb patch? (and by dtb patch could you confirm you mean you replace the full dtb file, not edit the dtb via dtc or armbian-config to only changed the emmc/opp voltages? I am on 6.12.1-edge-rockchip64 from linux-image-edge-rockchip64 24.11.1 with vanilla armbian dtb and the leds trigger file exists for helios64::status phn@helios64:~$ ls /sys/class/leds/ helios64:blue:hdd-status helios64:blue:power-status helios64:red:ata1-err helios64:red:ata3-err helios64:red:ata5-err helios64::status helios64:blue:net helios64:blue:usb3 helios64:red:ata2-err helios64:red:ata4-err helios64:red:fault mmc0:: phn@helios64:~$ ls /sys/class/leds/helios64\:\:status brightness device invert max_brightness power subsystem trigger uevent
  14. There is only ebin-dev dtb that has the non armbian upstreamed fixes (the older other comments are about doing the same changes on your own dtb, or about requesting feedback from people testing these changes). As far as I know ebin-dev dtb has the emmc fix in dtb and the CPU big voltage upped a little also in dtb. Both from me. They will both be uostreamed to armbian but only the Emma fix will go in Linux upstream (as the voltage up is only a hack that I was requested to try by a pine64 engineer but the cause of the crash is still not know. The current voltages are supposed to be already correct).
  15. I won't be able to test newer image until at least a week, away from the board. Before leaving, I was able to confirm that I get back panel sound with rock-5-itx_debian_bullseye_kde_b6.img.xz image (kernel 5.10.110-37-rockchip). You might want to provide amixer output to sort out any volume level issue. `cat /proc/asound/cards` `amixer` and `amixer -c 3` if card 3 is `rockchipes8316` in `cat /proc/asound/cards` output.
  16. Yes I planned them for 24.11 but had personal issue and also I took part in another board mainternship. The emmc patch need only that I sort out of to apply it to the current patfhset. I asked for insight on how to handle reformatting existing patch that I modify and was told that I should submit any changes to a patch in a separate patch by Werner but I am still uneasy to small patch files instead of fixing existing patches. I wonder if I explain my situation clearly ... The voltage patch needs a little more work as I made many instances of the voltage change on my side, so I need to disassemble ebin-dev dtb though it is achievable in a few hours. In short I tried learn ingambian processes first, had a little more work on another board (mostly learning required additional hardware and ordering it, then handlingna few glitches like power supply dying) then personal matters and taking part in gatherings. I now plan these commits for 25.02.
  17. I also think of a hardware issue. If you have a voltmeter you could check if you have 12V in the board (taking the ground on metal case around connector (ethernet, etc). Might be a repair shop could fix it. I don't know the cost of such a repair.
  18. I don't know why you don't get a shell on the serial console. But we can get the network issue worked around. There are two ethernet interfaces: end0 the 1Gbls and the second in your cawe enx646266d0034d the 2.5Gbps ethernet one. The enx646266d0034d is MAC hw address 64:62:66:d0:03:4d (I've the mac addr is in the interface name). The end0 should be the same minus one is 64:62:66:d0:03:4c. What are the status of the LEDs on the front panel (blinking/solid)?
  19. I believe it is supported https://wiki.kobol.io/helios64/hardware/ in the comments
  20. You could check with a voltmeter on the board for the 12V. Schematics are here https://wiki.kobol.io/helios64/docs/ . Do you have serial console output ? Though I believe with such an issue this is highly unlikely. From https://wiki.kobol.io/helios64/led/ comments, the system status LED and LAN LED are software controlled and HDD Activity LEDs hardware controlled. Though highly liekly LED1 (system rail power) is hardware controlled.
  21. Awesome. And thanks a lot for the feedback. Could you explain which side you disconnect/reconnected? The HDD side only? Or on the motherboard too? I believe HDD side is enough, just want a clue if that could be wrong. I believe the connector were not clean, or a bit oxidized out of fabric (maybe connectors were stored in an area with an aggressive climate... I am no expert, but I clue that with the parts being serviced during Covid mess, some unusual process happened). It is not like the hardware is bad, only "dirty". Here I had bad lost connectivity to an HDD, extracting and inserting it a few times back seemed to have cleaned the connectors (I also put isopropanol on them, but I don't remind if I brushed them at that time).
  22. -19 in ata1: link is slow to respond, please be patient (ready=-19) is ENODEV in https://github.com/torvalds/linux/blob/6efbea77b390604a7be7364583e19cd2d6a1291b/drivers/ata/libata-core.c#L3594 but ready is resetted to 0 else the function would exit before this message https://github.com/torvalds/linux/blob/6efbea77b390604a7be7364583e19cd2d6a1291b/drivers/ata/libata-core.c#L3577 As I told, I had bad contact on my sata ports, try removing and inserting the HDD in the sata socket a few times. That might remove things on the contacts. On my side I also clean the sata sockets with isopropanol when I bought some (I use 99,5% isopropanol but I don't know if less concentrated is OK). You might paste the complete log around the ata1 lines in the kernel log. If the issue is always with the same sata port, you could try swapping the HDD to see if the issue follows the HDD or if this is the link or port. But indeed I guess the issue is hardware.
  23. @BipBip1981 Could you grep for "ata", or check the logs for ata errors and paste them? Or tell if there are no other messages with ata in the logs? Don't you have any "hard resetting link" messages in the kernel logs? On my side I once had drives that were not detected extracting the drive from the SATA power and data socket a few times (and cleaning them with isopropanol once, might have helped) did it. I believe my socket were oxidized though that is a wild guess. Issue gone either way.
  24. Seeing how well the voltage hacks works on your boards I will include them in armbian (even though I still get crashes on my own board with only this 75mV hack, even though way less). But not upstream (I am close to sending the eMMC fix upstream, I only need to read the backlog there anew to avoid too much back and forth so the patch is up to the standard). at least until I sort out why they work (I was told to try them by a board designer that told me there was a design issue with the voltage regulator which I am not up to sort out. But I checked other rk3399 armbian boards'schematics and as far as I understand they have the same design. So either all of these boards are broken and are somewhat stable for an unknown reason (maybe less stress on the big CPUs) or I misunderstood what was wrong with helios64 hardware. I need to talk to an hardware engineer. Also I try to sort out a few other issues with other softwares and hardwares. And a few other issues. But I expect to have those in for mid October, maybe earlier.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines