OdyX Posted December 31, 2023 Posted December 31, 2023 That was it; it rebooted without issues with this change, thanks so much @ebin-dev! 0 Quote
OdyX Posted December 31, 2023 Posted December 31, 2023 (muffled rage sounds) It worked fine as long as it wasn't doing anything. Now I've added a munin-node, tor, various mounts, and 🎉 it has started to randomly stop (die, kernel panic, whatever, I can't determine what the issue is…). 😕 Any idea how to debug this? /var/log is a zram device (and journal has never shown anything useful). Do I have to resort to using a serial-console permanently to debug this? 0 Quote
ebin-dev Posted December 31, 2023 Author Posted December 31, 2023 (edited) In my use case there are no issues so far - using it 24/7 as a DNS server, file server, nextcloud server, music server, plex server, and for home automation. I kept everything simple (i.e. ext4 file system, no NFS). @OdyX The undocumented errors you describe are hard to believe. If your system really behaves like that you should consider to donate your board for testing and buy something else. Otherwise: flash u-boot version 21.08.9. If that does not help, you can easily change the linux kernel: try 5.15.93. Just install linux-image, linux-headers, linux-dtb with dpkg -i linux* and do not forget to delete the remaining 6.6.8 links from / and from /boot. And make sure that you have the right u-boot flashed to emmc. Edited January 3 by ebin-dev 0 Quote
OdyX Posted January 1 Posted January 1 The only somewhat relevant thing I updated was to use `linux-cpupower` instead of `cpufrequtils`, and I had started using the "ondemand" governor for the CPUs. I see that sbc-bench set the governor for all CPUs at "performance". I've now returned to leaving the settings as set by sbc-bench, to see if my suspicion (aroused by https://github.com/ThomasKaiser/sbc-bench/issues/62) is warranted. 0 Quote
ebin-dev Posted January 2 Author Posted January 2 (edited) @OdyX After executing sbc-bench I switched the governor back to ondemand. I never had issues with it. You may try schedutil too. I added a download link (above in this thread) to the dtb enhanced with hs400 and l2 cache information (for linux 6.6.8). You can safely copy it to /boot/dtb/rockchip/rk3399-kobol-helios64.dtb. Difficult to say if 'linux-cpupower' caused that trouble. # cat /etc/default/cpufrequtils ENABLE=true MIN_SPEED=408000 MAX_SPEED=1800000 GOVERNOR=ondemand Edited January 2 by ebin-dev 0 Quote
alchemist Posted January 2 Posted January 2 I still have NFS issues & kernel panics with the updated u-boot and patched kernel 6.6.8, I will investigate, this is probably not related to Helios64 and those patches (vanilla kernel do also have crashes with NFS). No issues with kernel 6.1.70 0 Quote
ebin-dev Posted January 3 Author Posted January 3 (edited) 14 hours ago, alchemist said: I still have NFS issues & kernel panics with the updated u-boot and patched kernel 6.6.8 Which u-boot version do you refer to (the one including pull requests from @prahal) ? P.S.: I included a note that in case you use NFS, kernel 6.1.70 should be used. Edited January 3 by ebin-dev 0 Quote
OdyX Posted January 5 Posted January 5 Well. With all setup as you said but with the 5.15.93 kernel, it now has a 3-days uptime. Yay. 0 Quote
ebin-dev Posted January 6 Author Posted January 6 20 hours ago, OdyX said: Well. With all setup as you said but with the 5.15.93 kernel, it now has a 3-days uptime. Yay. Good news! Did you modify the dtb to support emmc hs400 speed and l2-cache information ? It would be nice if you could also try the 6.1.70 kernel. 0 Quote
OdyX Posted January 6 Posted January 6 Yeah, I downloaded the dtb from your Dropbox link; thanks for that. Am now rebooting into 6.1.71, and will report back. 0 Quote
OdyX Posted January 8 Posted January 8 6.1.71 only got 17 hours of uptime, then 5, so I now tried 6.6, which died after barely 1 hour. Will revert to 5.15.93. 0 Quote
prahal Posted January 8 Posted January 8 @OdyX can you try this testcase with 5.15.93 (it should crash) 0 Quote
ebin-dev Posted January 9 Author Posted January 9 (edited) 13 hours ago, OdyX said: 6.1.71 only got 17 hours of uptime, then 5, so I now tried 6.6, which died after barely 1 hour. Will revert to 5.15.93. Thank you for trying - also very useful for others. Are you running something demanding for the hardware ? I will stay with 6.6.8 since it is 100 % stable in my use case. Edited January 9 by ebin-dev 0 Quote
OdyX Posted January 9 Posted January 9 No, just some 3-4 docker images (jackett, radarr, sonarr), 2 tor clients, smb, transmission. 0 Quote
TDCroPower Posted January 9 Posted January 9 (edited) can anyone tell me how I can continue to update with a freezed kernel? I would like to leave the kernel on my Helios64 at 5.15.93 and continue to install updates. When I freeze the kernel with armbian-config under System >>> Freeze the background color changes from blue to red. If I then go to "Firmware" below, is the kernel still updated during the update? The kernel is also updated with "apt update && apt upgrade" !? Currently I have an unpleasant situation, because I have installed the kernel again via armbian-config to 5.15.93-rockchip64 #23.02.2... root@helios64:~# uname -a Linux helios64 5.15.93-rockchip64 #23.02.2 SMP PREEMPT Fri Feb 17 23:48:36 UTC 2023 aarch64 GNU/Linux root@helios64:~# apt update && apt list --upgradable Hit:1 http://deb.debian.org/debian bullseye InRelease Hit:2 http://deb.debian.org/debian bullseye-updates InRelease Hit:3 http://deb.debian.org/debian bullseye-backports InRelease Hit:4 https://download.docker.com/linux/debian bullseye InRelease Hit:5 http://armbian.hosthatch.com/apt bullseye InRelease Reading package lists... Done Building dependency tree... Done Reading state information... Done 3 packages can be upgraded. Run 'apt list --upgradable' to see them. Listing... Done armbian-bsp-cli-helios64/bullseye 23.11.1 arm64 [upgradable from: 21.08.2] linux-dtb-current-rockchip64/bullseye 23.11.1 arm64 [upgradable from: 23.02.2] linux-image-current-rockchip64/bullseye 23.11.1 arm64 [upgradable from: 23.02.2] edit: I found something about this with the command "apt-mark hold". If I compare my 2 Helios64 installations with "apt-mark showhold", the following was blocked on one of them via armbian-config... root@helios64:~# apt-mark showhold armbian-bsp-cli-helios64 armbian-firmware linux-dtb-current-rockchip64 linux-image-current-rockchip64 and on the 2nd one... root@helios64:~# apt-mark showhold armbian-bsp-cli-helios64 armbian-firmware why wasn't the linux* updates also marked in the 2nd one? Edited January 9 by TDCroPower 0 Quote
prahal Posted January 10 Posted January 10 @TDCroPower I use "apt-mark hold <package>" and then on "apt upgrade" I get the package marked as hold told: The following packages have been kept back: linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64 linux-image-edge-rockchip64 The following packages will be upgraded: openmediavault-compose still "apt list --upgradable" tells the on-hold packages as upgradable. I mean "apt-mark showhold" is the way to tell which packages will be kept on upgrade. For the second question, why on your second box you do not have linux-dtb-* and linux-image-* on hold, I can only guess from the armbian-config code (ie /usr/lib/armbian-config/jobs.sh and search for "Freeze") that armbian-config only Freeze the installed packages with the name including the BRANCH value from "grep BRANCH /etc/armbian-release". I believe if you have linux-dtb-edge-rockchip64 installed with BRANCH=current in /etc/armboan-release then armbian-config will fail to mark the package with "edge" instead of "current" in its name as on hold. 1 Quote
prahal Posted January 10 Posted January 10 @ebin-dev @OdyX note that the instability I noticed is related to the big CPU cluster (the a72) and that the SATA and XHCI (usb3 including r8152) are bound to these a72 big cpus in /usr/lib/armbian/armbian-hardware-optimization (though I do not understand this code yet): case ${BOARD_NAME} in "Helios64") for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 10 > /proc/irq/$i/smp_affinity done for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 30 > /proc/irq/$i/smp_affinity done ;; I believe raid resync or rebuild triggers this instability via the SATA activity (even at boot). (ie I kept my HDD setup from the Helios4 install instructions, ie raid10) So not using RAID could help keep the board stable, even with a heavy load. 0 Quote
ebin-dev Posted January 10 Author Posted January 10 (edited) 20 hours ago, prahal said: So not using RAID could help keep the board stable, even with a heavy load. The r8152 driver is using xhci-hcd. The above code assigns cpu4 to those processes (echo 10 ...). I tried to assign cpu4 and cpu5 to xhci processes (echo 30) and to assign only cpu5 to ahci (sata) (echo 20), but nic offloading turned out to be more beneficial. P.S.: I am not using RAID to keep it simple (but I maintain at least 4 separate backups). The file system itself also has some impact ext4 -> btrfs -> zfs. Edited January 11 by ebin-dev 0 Quote
OdyX Posted January 11 Posted January 11 (edited) Would it make sense to let the kernel pick where it does what? That seems like a weird optimization to have. (I'm using ext4 on cryptsetup on raid5, on 5 spinny 3,5" disks) Edited January 11 by OdyX 0 Quote
ebin-dev Posted January 12 Author Posted January 12 16 hours ago, OdyX said: Would it make sense to let the kernel pick where it does what? The purpose of the 'armbian-hardware-optimization' is to bind the heavy tasks to one or both big core(s), since the little ones would not be capable of dealing with them. For some heavy tasks the kernel may need more than one big core. So I think this would make sense. Cryptosetup on a raid 5 with 5 spinning disks is definitely a huge load. Dealing with the ethernet traffic on the 2.5G interface is also a heavy load: 280Mbyts/s have to be handled in pieces of 1500bytes (MTU) - simultaneously in both directions. So it might make some sense if you modify the optimization settings depending on your use case. 0 Quote
prahal Posted January 12 Posted January 12 @ebin-dev@OdyX I found that the Helios64-specific code in /usr/lib/armbian/armbian-hardware-optimization is not run. This helios64-specific code is under the rk3399 BOARDFAMILY section, while for armbian-bsp-cli-helios64-current 23.11.1 (bookworm) which ships /etc/armbian-release BOARDFAMILY is rockchip64 .... I found out by checking /proc/interrupts and seeing ahci spread across little and big cores and xhci only on little cores. You might also notice that armbian-hardware-optimization sets settings on eth0 while we only have end0 and eth1. 0 Quote
ebin-dev Posted January 12 Author Posted January 12 (edited) 2 hours ago, prahal said: I found that the Helios64-specific code in /usr/lib/armbian/armbian-hardware-optimization is not run. That is true - in that script eth0 should be replaced by end0 if you wish to apply the Armbian optimisation to the 1G interface. The interrupt for xhci-hcd:usb1 devices is 46 and smb affinity is set to 10 (cpu4) on my system (used by the 2.5G interface): # cat /proc/irq/46/smp_affinity 10 # cat /proc/interrupts | grep xhci 46: 1481 0 261 0 998993 0 GICv3 142 Level xhci-hcd:usb1 93: 0 0 0 0 0 0 GICv3 137 Level dwc3-otg, xhci-hcd:usb5 CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 Edited January 12 by ebin-dev 0 Quote
TDCroPower Posted January 12 Posted January 12 @prahal thanks for the explanation, which packages have to be set to onhold if you just don't want to update the kernel? Or is the kernel bound to the os release? 0 Quote
prahal Posted January 12 Posted January 12 (edited) @TDCroPower the kernel packages are armbian BRANCH based (current, edge, etc). So not bound to the OS release ("current" will always be the latest armbian stable release), but armbian channel bound. You can have multiple branches installed (in this case if the current is updated after edge kernel debs current will be the new default kernel loaded at boot even if edge is newer, I believe we are not supposed to install both edge and current but nothing prevents us from doing so). You can tell your installed kernel packages with: dpkg -l "linux-*-rockchip64" | grep "^.i" here I have: hi linux-dtb-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge DTBs in /boot/dtb-6.6.11-edge-rockchip64 hi linux-headers-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge headers 6.6.11-edge-rockchip64 hi linux-image-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge kernel image 6.6.11-edge-rockchip64 so it means I only have edge branch kernel installed. So to put them on hold (they already are thus the leading "hi", for on hold and installed), I do: sudo apt-mark hold linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64 linux-image-edge-rockchip64 If your installed Linux kernel packages are named linux-*-current-rockchip64 you instead do: sudo apt-mark hold linux-dtb-current-rockchip64 linux-headers-current-rockchip64 linux-image-current-rockchip64 Mind I replaced armbian-bsp-cli-helios64-current with armbian-bsp-cli-helios64-edge and noticed that the edge version have BRANCH defined as "current" in the /etc/armbian-release that is shipped with the edge version. You can check the version of your armbian-bsp-cli package with: dpkg -l "armbian-bsp-cli-helios64-*" I believe the fact the "edge" version does define BRANCH="current" in /etc/armbian-release shipped by this package to be a bug and would explain why armbian-config would fail to freeze your kernel. Edited January 12 by prahal fix incorrect find about the /etc/armbian-release BRANCH value 0 Quote
prahal Posted January 12 Posted January 12 (edited) @ebin-dev what I meant was that the Helios64 specific code is in the rk3399 section. But I believe nowadays (you can check in /etc/armbian-release the value of BOARDFAMILY) this rk3399 is not read on a helios64 install. The section "rockchip64" above the "rk3399" one is instead (line 203 of /usr/lib/armbian/armbian-hardware-optimization for the armbian-bsp-cli-helios64-edge package). You can check the logs of the systemd service: systemctl status armbian-hardware-optimize.service Maybe you already tweak the end0 in the rockchip64 section though you could then see that the Helios64 BOARD_NAME section is under rk3399; not rockchip64. @ebin-devYou seem to have the Helios64 code that is applied: Can you give your /etc/armbian-release content? Especially the BOARDFAMILY value? And "dpkg -l armbian-bsp-cli-helios64-*" output? Edited January 12 by prahal ask for more details 0 Quote
ebin-dev Posted January 12 Author Posted January 12 (edited) @prahal So I used the opportunity to fix the settings for the 1G interface in armbian-hardware-optimization: My /etc/armbian-release states BOARDFAMILY=rk3399, Branch=current and I therefore edited the lines starting at line 251 - and that was successful. But I do not know if there is any positive effect on the 1G interface, since I am using 2.5G only. (I will go through the settings in /sys/class/net/end1 and see if something can be tuned for the 2.5G interface.) # cat /etc/armbian-release BOARD=helios64 BOARD_NAME="Helios64" BOARDFAMILY=rk3399 ... BRANCH=current #cat /usr/lib/armbian/armbian-hardware-optimization lines 251 ff: echo 8 > /proc/irq/$(awk -F":" "/end0/ {print \$1}" < /proc/interrupts | sed 's/\ //g')/smp_affinity echo 7 > /sys/class/net/end0/queues/rx-0/rps_cpus echo 32768 > /proc/sys/net/core/rps_sock_flow_entries echo 32768 > /sys/class/net/end0/queues/rx-0/rps_flow_cnt # systemctl status armbian-hardware-optimize.service ● armbian-hardware-optimize.service - Armbian hardware optimization Loaded: loaded (/lib/systemd/system/armbian-hardware-optimize.service; enabled; preset: enabled) Active: active (exited) since Fri 2024-01-12 15:49:01 CET; 33min ago Process: 700 ExecStart=/usr/lib/armbian/armbian-hardware-optimization start (code=exited, status=0/SUCCESS) Main PID: 700 (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 4374) Memory: 2.1M CPU: 795ms CGroup: /system.slice/armbian-hardware-optimize.service I do not know if the following information is correct: # dpkg -l armbian-bsp-cli-helios64-* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-================================-=========================================-============-===================================================== hi armbian-bsp-cli-helios64-current 23.08.0-trunk--1-PCc73b-Vccab-H1d04-B9c45 arm64 Armbian CLI BSP for board 'helios64' branch 'current' Edited January 12 by ebin-dev 0 Quote
OdyX Posted January 13 Posted January 13 (edited) From the code history, it looks like rk3399 is the "old" family, Helios64 should be BOARDFAMILY=rockchip64 (see https://github.com/armbian/build/blob/main/config/boards/helios64.csc#L3), so it's the code in /usr/lib/armbian/armbian-hardware-optimization that's outdated. I've therefore proposed https://github.com/armbian/build/pull/6159 . Please test and comment there if you can! Edited January 13 by OdyX 0 Quote
ebin-dev Posted January 14 Author Posted January 14 (edited) Just to let you know: the directory 'beta/pool/main/l/linux-6.6.8/' has vanished over night (actually the entire folder fi.mirror.armbian.de/beta/pool/main). This was the location we used for downloading linux 6.6.x (6.1.y etc.) (kernel, dtb and headers). My link to the linux 6.6.8 files in this forum (as downloaded on 23.12.2023) remains active - also added links to linux 6.1.71 and 5.15.93 (dropbox). Edited January 14 by ebin-dev 0 Quote
OdyX Posted January 14 Posted January 14 @ebin-dev I think this is caused by https://github.com/armbian/build/commit/1f4df4c41fe33f9822ca2f42d14a2a445e27aed7 ; Rockchip64's 'edge' kernels were bumped to 6.7. 0 Quote
Igor Posted January 14 Posted January 14 6 hours ago, ebin-dev said: the directory 'beta/pool/main/l/linux-6.6.8/' has vanished over night Expected behavior (unless process breaks in the middle and files are not there for some other reason) - we only provide last kernels. Easier way to change - providing more kernels at repository - have great impact on build time, while doing it smarter would require a person to focus only on this part of infrastructure for some time ... alternatively, we could drop all deb packages to some location, but again someone (not me) would need to code and maintain this automation. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.