Jump to content

iav

Members
  • Posts

    60
  • Joined

  • Last visited

Everything posted by iav

  1. @SymbiosisSystems, Thanks for testing and reporting back. Before diving into the panic itself, a bit of context: the CPU stability overlay we packaged from the prahal/ebin-dev recipe is opt-in precisely because, thanks to the RK3399 silicon lottery, most Helios64 units don't need it. The very fact that you do need it already hints that your particular unit is sitting closer to the edge of its operating margin than a typical one. That's not blaming the hardware — just an observation that units with marginal stability often expose other weak spots that stay invisible on an "average" board. Now to the crash itself. `Asynchronous SError` in `rockchip_pcie_rd_conf` during PCI enumeration is a known bus-level abort on the RK3399 PCIe controller, not on the CPU. Our overlay has nothing to do with CPU cluster voltages here: it only raises `vdd_cpu_b` on opp00..opp06 of the A72 cluster, while PCIe sits on `vdd_logic` and separate PCIe rails which we don't touch. The PCIe x2 Gen2 lane from the RK3399 in Helios64 is wired directly to the soldered-on JMicron JMB585 SATA controller ([Kobol Wiki — SATA](https://wiki.kobol.io/helios64/sata/)), which provides the five SATA ports. So the only possible endpoint where bus scan is currently failing is the JMB585. The typical scenario goes like this: the endpoint replies with CRS (Configuration Request Retry Status) or simply doesn't respond within the completion timeout window → the root complex raises a completion abort → AArch64 sees it as an async SError. This class of bugs on the RK3399 is well documented: - [\[BUG\] rk3399-rockpro64 pcie synchronous external abort](https://lore.kernel.org/linux-pci/CAMdYzYrYHtiEXwiKxwWcKSV7Re6dG4zTvkKtZxvso+fLBRYbPQ@mail.gmail.com/T/) — same class of bus-level abort on the same SoC. - [RFC patch: configurable bus scan delay](https://www.spinics.net/lists/linux-pci/msg103350.html) — the general nature of CRS/timeout problems (the specific HBA delays from that thread don't directly apply to Helios64, since your endpoint is a fixed JMB585, not a swappable card). - [rockchip-linux/kernel#116](https://github.com/rockchip-linux/kernel/issues/116) and [#118](https://github.com/rockchip-linux/kernel/issues/118) — "Some PCIe devices cause Rockchip PCIe controller to crash in bus scan". - The [v5 series fixing the RK3399 PCIe endpoint driver](https://patchwork.ozlabs.org/project/devicetree-bindings/cover/20230418074700.1083505-1-rick.wertenbroek@gmail.com/#3135632) — CRS handling and the Configuration Enable bit. - Armbian has known about RK3399 PCIe quirks for a long time: [commit edb45f9 — "Make PCIe reset optional"](https://github.com/armbian/build/commit/edb45f9acf322fc4b27bf7efba3f14a3a432b617). So your stack trace falls into a family of long-known RK3399 PCIe problems and is unrelated to the voltage overlay. If someone — from the rockchip64 maintainers or from the community — picks up your case, here's what is usually expected to be attached to a report like this, otherwise diagnosis will inevitably hit a wall of missing data: 1. A full serial bootlog via USB-UART (Helios64 → 1500000 baud), both with the overlay and without it (just comment out `overlays=` in `armbianEnv.txt`) — to see whether the picture differs or this is an independent problem. 2. State of the SATA bays: try booting with the bays completely empty (or one drive at a time) — if the PCIe link to the JMB585 comes up cleanly without any SATA load, the problem narrows down to one of the drives rather than the link itself. 3. u-boot and BL31 (ATF) versions — `dmesg | grep -iE "u-boot|psci|bl31"`, plus `strings /boot/u-boot.bin | grep -m1 "U-Boot 20"` if you have access. 4. The full `dmesg` up to the moment of the crash — even if truncated, whatever made it out to serial. 5. The entire contents of `/boot/armbianEnv.txt`. 6. Try two extra kernel boot parameter options — this is the simplest experiment and often helps on unstable RK3399 PCIe. **a) Disable PCIe power management** (a known workaround for the broken PM on RK3399). Open `/boot/armbianEnv.txt`: ``` sudo nano /boot/armbianEnv.txt ``` Find a line starting with `extraargs=`. Two cases: - **If such a line exists** — append the parameters at its end, separated by a space. For example, it was: ``` extraargs=console=tty1 ``` Becomes: ``` extraargs=console=tty1 pcie_aspm=off pcie_port_pm=off ``` - **If there is no such line at all** — just add a new line at the very end of the file: ``` extraargs=pcie_aspm=off pcie_port_pm=off ``` Save (Ctrl+O, Enter, Ctrl+X), `sudo reboot`, try booting. To revert: remove what you appended (or delete the added line entirely) and reboot again. **b)** You can try downgrading PCIe from Gen2 to Gen1 (RK3399 Gen2 link training is regularly unstable), but that requires writing your own device tree overlay — there is no ready-made one in Armbian. And finally, to be honest: we are not PCIe stack experts and not rockchip64
  2. Hi @ebin-dev, @SymbiosisSystems, and everyone in this thread, Following the recurring instability reports here and in the older topic ( https://forum.armbian.com/topic/30074-helios64-armbian-2308-bookworm-issues-solved/ ), I've packaged your opp-microvolt workaround as an opt-in DT overlay in the Armbian build framework. PR: https://github.com/armbian/build/pull/9822 — adds the overlay to both Armbian kernel trees: `rockchip64-current` (6.18) and `rockchip64-edge` (7.0). What you get once this lands: - `rockchip-rk3399-helios64-cpu-stability.dtbo` ships inside the regular `linux-dtb-{current,edge}-rockchip64` package. No hand-patching of DTBs, no separate downloads; `apt upgrade` keeps the overlay in sync with whatever DTB your kernel ships. - **Not enabled by default** — for the people whose Helios64 "just works", the mainline OPPs stay untouched. I don't want to push a tree-wide voltage bump onto every user when only some units exhibit the instability. - Activation is the standard Armbian way, either: armbian-config → System → Kernel → Manage device tree overlays → [*] rk3399-helios64-cpu-stability → save → reboot or manually, by adding the overlay name to the `overlays=` line in `/boot/armbianEnv.txt` (the `rockchip-` prefix is implicit, because `overlay_prefix=rockchip` is already set on this board): overlays=rk3399-helios64-cpu-stability Then reboot. Voltages are exactly the ones from your post in this thread — https://forum.armbian.com/topic/58597-helios64-armbian-trixie-with-linux-618-incl-opp-microvolt-patch/?do=findComment&comment=237456 (opp00..opp06 raised to 900 / 900 / 900 / 950 / 1025 / 1100 / 1175 mV; opp07 left at the mainline 1.20 V; `max` everywhere kept at 1.25 V). Frequencies are not touched. End-to-end verified on my Helios64 with both kernels: - current / 6.18.30, Trixie SD-card image - edge / 7.0.7, Trixie SD-card image (locally built) After enabling the overlay and a reboot: for n in 0 1 2 3 4 5 6 7; do od -An -tx4 --endian=big \ /sys/firmware/devicetree/base/opp-table-1/opp0$n/opp-microvolt done opp00 000dbba0 000dbba0 001312d0 opp01 000dbba0 000dbba0 001312d0 opp02 000dbba0 000dbba0 001312d0 opp03 000e7ef0 000e7ef0 001312d0 opp04 000fa3e8 000fa3e8 001312d0 opp05 0010c8e0 0010c8e0 001312d0 opp06 0011edd8 0011edd8 001312d0 opp07 00124f80 00124f80 001312d0 ...which matches your table 1:1. U-boot log line on boot: Applying kernel provided DT overlay rockchip-rk3399-helios64-cpu-stability.dtbo confirms that u-boot picks up the `.dtbo` from `/boot/dtb/.../rockchip/overlay/` and applies it via `fdt apply` before the kernel starts. Ready-to-flash **current/6.18** images, built from the PR branch by the official Armbian builder workflow: https://fi.mirror.armbian.de/incoming/iav/helios64/archive/ - Armbian_26.5.0_Helios64_resolute_current_6.18.30_minimal.img.xz - Armbian_26.5.0_Helios64_resolute_current_6.18.30_xfce_desktop.img.xz - Armbian_26.5.0_Helios64_trixie_current_6.18.30_minimal.img.xz If any of you can grab one of those (or wait for a nightly after the PR is merged) and confirm the workaround applies cleanly through the overlay path on your board, that would help the PR land. Bug reports are welcome too. Attribution lives in the overlay README block (look for the heading `### rk3399-helios64-cpu-stability`), pointing back to forum topics 30074 and 58597, prahal and ebin-dev: https://github.com/armbian/build/blob/feat/helios64-cpu-stability-overlay/patch/kernel/archive/rockchip64-6.18/overlay/README.rockchip-overlays https://github.com/armbian/build/blob/feat/helios64-cpu-stability-overlay/patch/kernel/archive/rockchip64-7.0/overlay/README.rockchip-overlays Thanks!
  3. Hi @ebin-dev, @SymbiosisSystems, and everyone in this thread, Following the recurring instability reports here and in the older topic 30074, I've packaged your opp-microvolt workaround as an opt-in DT overlay in the Armbian build framework. PR: **armbian/build#9822** — adds the overlay to both Armbian kernel trees: `rockchip64-current` (6.18) and `rockchip64-edge` (7.0). What you get once this lands: - `rockchip-rk3399-helios64-cpu-stability.dtbo` ships inside the regular `linux-dtb-{current,edge}-rockchip64` package. No hand-patching of DTBs, no separate downloads; `apt upgrade` keeps the overlay in sync with whatever DTB your kernel ships. - **Not enabled by default** — for the people whose Helios64 "just works", the mainline OPPs stay untouched. I don't want to push a tree-wide voltage bump onto every user when only some units exhibit the instability. - Activation is the standard Armbian way, either: armbian-config → System → Kernel → Manage device tree overlays → [*] rk3399-helios64-cpu-stability → save → reboot or manually, by adding the overlay name to the `overlays=` line in `/boot/armbianEnv.txt` (the `rockchip-` prefix is implicit, because `overlay_prefix=rockchip` is already set on this board): overlays=rk3399-helios64-cpu-stability Then reboot. Voltages are exactly the ones from post #237456 (opp00..opp06 raised to 900 / 900 / 900 / 950 / 1025 / 1100 / 1175 mV; opp07 left at the mainline 1.20 V; `max` everywhere kept at 1.25 V). Frequencies are not touched. End-to-end verified on my Helios64 with both kernels: - current / 6.18.30, Trixie SD-card image - edge / 7.0.7, Trixie SD-card image (locally built) After enabling the overlay and a reboot: for n in 0 1 2 3 4 5 6 7; do od -An -tx4 --endian=big \ /sys/firmware/devicetree/base/opp-table-1/opp0$n/opp-microvolt done opp00 000dbba0 000dbba0 001312d0 opp01 000dbba0 000dbba0 001312d0 opp02 000dbba0 000dbba0 001312d0 opp03 000e7ef0 000e7ef0 001312d0 opp04 000fa3e8 000fa3e8 001312d0 opp05 0010c8e0 0010c8e0 001312d0 opp06 0011edd8 0011edd8 001312d0 opp07 00124f80 00124f80 001312d0 ...which matches your table 1:1. U-boot log line on boot: Applying kernel provided DT overlay rockchip-rk3399-helios64-cpu-stability.dtbo confirms that u-boot picks up the `.dtbo` from `/boot/dtb/.../rockchip/overlay/` and applies it via `fdt apply` before the kernel starts. Ready-to-flash **current/6.18** images, built from the PR branch by the official Armbian builder workflow: https://fi.mirror.armbian.de/incoming/iav/helios64/archive/ - Armbian_26.5.0_Helios64_resolute_current_6.18.30_minimal.img.xz - Armbian_26.5.0_Helios64_resolute_current_6.18.30_xfce_desktop.img.xz - Armbian_26.5.0_Helios64_trixie_current_6.18.30_minimal.img.xz If any of you can grab one of those (or wait for a nightly after the PR is merged) and confirm the workaround applies cleanly through the overlay path on your board, that would help the PR land. Bug reports are welcome too. Attribution lives in `patch/kernel/archive/rockchip64-{6.18,7.0}/overlay/README.rockchip-overlays` under `### rk3399-helios64-cpu-stability`, pointing back to topics 30074 and 58597, prahal and ebin-dev. Thanks!
  4. Can you give a link to the patches you are talking about? Preferably not on a dropbox or pastebin-like, but to their original source — a repository or mailing list.
  5. Wrote a PR to remedy the situation https://github.com/armbian/build/pull/9800
  6. Thank you. I couldn't even think of such a thing. Lord, why is it necessary to preserve the state of indicators that are needed precisely to show the state of something?
  7. I use etckeeper on my ODroid-N2. he sends me changes to my /etc/once a day. And regularly I get this change: armbian-leds.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/armbian-leds.conf b/armbian-leds.conf index 8108230..537ea6f 100644 --- a/armbian-leds.conf +++ b/armbian-leds.conf @@ -5,5 +5,5 @@ invert=0 [/sys/class/leds/rtw88-1-1.1:1.0] trigger=phy0tpt -brightness=1 +brightness=0 What is this mess? What is it for?
  8. I add a patch from https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux.git/patch/?id=79482f3791c4760b9b0d8d9bfde9f1053ea3dd5e into my build (`userpatches/kernel/archive/meson64-6.18`) And success! 7-Zip 23.01 (arm64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20 64-bit arm_v:8 locale=en_GB.UTF-8 Threads:6 OPEN_MAX:1024 Compiler: 13.2.0 GCC 13.2.0 Linux : 6.18.0-edge-meson64 : #1 SMP PREEMPT Sun Nov 30 22:42:10 UTC 2025 : aarch64 PageSize:4KB THP:always hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD LE 1T CPU Freq (MHz): 1887 1899 1901 1901 1903 1903 1903 3T CPU Freq (MHz): 299% 1904 300% 1903 RAM size: 3773 MB, # CPU hardware threads: 6 RAM usage: 1334 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 7750 530 1423 7540 | 135098 544 2118 11518 23: 7635 541 1438 7780 | 134250 551 2106 11614 24: 7469 535 1502 8032 | 131036 551 2089 11498 25: 7334 545 1537 8374 | 129234 552 2085 11501 ---------------------------------- | ------------------------------ Avr: 7547 538 1475 7931 | 132404 549 2099 11533 Tot: 544 1787 9732 @c0rnelius, you hit them! Thank you!!!
  9. I did a bisect. 3381d25b77fbf1ebaaa151a9f2be66fbf1ca3a1e is the first bad commit commit 3381d25b77fbf1ebaaa151a9f2be66fbf1ca3a1e Author: Ricardo Pardini <ricardo@pardini.net> Date: Sun Oct 12 19:57:32 2025 +0200 I try to build 6.18 kernel with config file linux-meson64-edge.config taken from "good" 6.17 kernel. but result was a "slow" kernel. Before I hope to find "bad" change in kernel config, now it's clear problem not there. I have no more ideas 😢
  10. Seems you are using redhat kernel. it means something wrong changed in armbian meson64 kernel config after 6.16.??
  11. After updating the edge kernel to 6.18-rc, the performance of my ODroid-N2 dropped drastically. Here are the performance test results of 7zip with different kernel versions for comparison I compile the kernel myself using the armbian build system and install the resulting .deb. '7z b' 7-Zip 23.01 (arm64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20 64-bit arm_v:8 locale=en_GB.UTF-8 Threads:6 OPEN_MAX:1024 Compiler: 13.2.0 GCC 13.2.0 Linux : 6.16.6-edge-meson64 : #1 SMP PREEMPT Tue Sep 9 17:02:41 UTC 2025 : aarch64 PageSize:4KB THP:always hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD LE 1T CPU Freq (MHz): 1901 1897 1904 1904 1904 1904 1903 3T CPU Freq (MHz): 298% 1894 298% 1892 RAM size: 3769 MB, # CPU hardware threads: 6 RAM usage: 1334 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 7244 525 1343 7048 | 136762 553 2109 11660 23: 6896 533 1317 7026 | 128222 535 2071 11092 24: 6626 539 1323 7124 | 114197 478 2096 10021 25: 6170 542 1301 7045 | 121553 536 2020 10818 ---------------------------------- | ------------------------------ Avr: 6734 535 1321 7061 | 125184 526 2074 10898 Tot: 530 1698 8979 ------------------------------------------------------------------------------------------------- 7-Zip 23.01 (arm64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20 64-bit arm_v:8 locale=en_GB.UTF-8 Threads:6 OPEN_MAX:1024 Compiler: 13.2.0 GCC 13.2.0 Linux : 6.18.0-rc7-edge-meson64 : #1 SMP PREEMPT Sun Nov 23 22:53:16 UTC 2025 : aarch64 PageSize:4KB THP:always hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD LE 1T CPU Freq (MHz): 1966 1985 1732 1940 1979 1946 1486 3T CPU Freq (MHz): 64% 424 97% 641 RAM size: 3773 MB, # CPU hardware threads: 6 RAM usage: 1334 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 1045 85 1191 1018 | 20252 91 1903 1727 23: 1078 90 1218 1099 | 18815 86 1889 1628 24: 1100 93 1274 1183 | 19644 92 1880 1724 25: 914 78 1332 1044 | 19697 94 1873 1753 ---------------------------------- | ------------------------------ Avr: 1034 87 1253 1086 | 19602 91 1886 1708 Tot: 89 1570 1397 I have done similar tests on ODroid-M1 (rockchip), but nothing similar happens there. What could be the reason?
  12. I build fresh Armbian-unofficial_25.02.0-trunk_Helios4_plucky_edge_6.10.14 from armbian trunk (1f697206562). Image boots successfuly. Why?
  13. I did kind of manual bisect by commits to config/boards/odroidm1.conf U-Boot 2024.04-armbian (Jan 29 2025 - 02:13:41 +0000) commit 3215bf9f3699c2ee4bc741efe82065315e4ff46f linux-u-boot-odroidm1-edge_24.8.0-trunk_arm64__2024.04-S2504-P3b44-Haddb-V73be-B4f92-R448a.deb is GOOD, U-Boot 2024.07-armbian-2024.07-S3f77-P7e9c-Hfb3e-V8f4f-Bb66d-R448a (Jan 29 2025 - 02:30:21 +0000) commit 268c2e85960978c90025bc18cf7d5f8e5cfbdab6 linux-u-boot-odroidm1-edge_24.8.0-trunk_arm64__2024.07-S3f77-P7e9c-Hfb3e-V8f4f-Bb66d-R448a.deb is BAD then I fix my Odroid M1 by doing sudo apt install ./linux-u-boot-odroidm1-edge_24.8.0-trunk_arm64__2024.04-S2504-P3b44-Haddb-V73be-B4f92-R448a.deb then armbian-install → Install/Update the bootloader on MTD Flash
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines