Jump to content

Helios64 - Armbian 23.08 Bookworm issues (solved)


Go to solution Solved by ebin-dev,

Recommended Posts

Posted

(muffled rage sounds)

 

It worked fine as long as it wasn't doing anything. Now I've added a munin-node, tor, various mounts, and 🎉 it has started to randomly stop (die, kernel panic, whatever, I can't determine what the issue is…). 😕 Any idea how to debug this? /var/log is a zram device (and journal has never shown anything useful). Do I have to resort to using a serial-console permanently to debug this?

Posted (edited)

In my use case there are no issues so far - using it 24/7 as a DNS server, file server, nextcloud server, music server, plex server, and for home automation. I kept everything simple (i.e. ext4 file system, no NFS).

 

@OdyX The undocumented errors you describe are hard to believe. If your system really behaves like that you should consider to donate your board for testing and buy something else.

 

Otherwise: flash u-boot version 21.08.9. If that does not help, you can easily change the linux kernel: try 5.15.93. Just install linux-image, linux-headers, linux-dtb with dpkg -i  linux* and do not forget to delete the remaining 6.6.8 links from / and from /boot. And make sure that you have the right u-boot flashed to emmc.

Edited by ebin-dev
Posted

The only somewhat relevant thing I updated was to use `linux-cpupower` instead of `cpufrequtils`, and I had started using the "ondemand" governor for the CPUs. I see that sbc-bench set the governor for all CPUs at "performance". I've now returned to leaving the settings as set by sbc-bench, to see if my suspicion (aroused by https://github.com/ThomasKaiser/sbc-bench/issues/62) is warranted.

Posted (edited)

@OdyX After executing sbc-bench I switched the governor back to ondemand. I never had issues with it. You may try schedutil too.

I added a download link (above in this thread) to the dtb enhanced with hs400 and l2 cache information (for linux 6.6.8). You can safely copy it to /boot/dtb/rockchip/rk3399-kobol-helios64.dtb.

 

Difficult to say if 'linux-cpupower' caused that trouble.

 

# cat /etc/default/cpufrequtils 
ENABLE=true
MIN_SPEED=408000
MAX_SPEED=1800000
GOVERNOR=ondemand

 

Edited by ebin-dev
Posted

I still have NFS issues & kernel panics with the updated u-boot and patched kernel 6.6.8, I will investigate, this is probably not related to Helios64 and those patches (vanilla kernel do also have crashes with NFS).

No issues with kernel 6.1.70

Posted (edited)
14 hours ago, alchemist said:

I still have NFS issues & kernel panics with the updated u-boot and patched kernel 6.6.8

 

Which u-boot version do you refer to  (the one including pull requests from @prahal) ?

 

P.S.: I included a note that in case you use NFS, kernel 6.1.70 should be used.

Edited by ebin-dev
Posted
20 hours ago, OdyX said:

Well. With all setup as you said but with the 5.15.93 kernel, it now has a 3-days uptime. Yay.

 

Good news! Did you modify the dtb to support emmc hs400 speed and l2-cache information ?

 

It would be nice if you could also try the 6.1.70 kernel.

Posted

Yeah, I downloaded the dtb from your Dropbox link; thanks for that. Am now rebooting into 6.1.71, and will report back.

Posted

6.1.71 only got 17 hours of uptime, then 5, so I now tried 6.6, which died after barely 1 hour. Will revert to 5.15.93.

Posted (edited)
13 hours ago, OdyX said:

6.1.71 only got 17 hours of uptime, then 5, so I now tried 6.6, which died after barely 1 hour. Will revert to 5.15.93.

 

Thank you for trying - also very useful for others. Are you running something demanding for the hardware ?

I will stay with 6.6.8 since it is 100 % stable in my use case.

Edited by ebin-dev
Posted (edited)

can anyone tell me how I can continue to update with a freezed kernel?
I would like to leave the kernel on my Helios64 at 5.15.93 and continue to install updates.
When I freeze the kernel with armbian-config under System >>> Freeze the background color changes from blue to red.
If I then go to "Firmware" below, is the kernel still updated during the update?

 

The kernel is also updated with "apt update && apt upgrade" !?

 

Currently I have an unpleasant situation, because I have installed the kernel again via armbian-config to 5.15.93-rockchip64 #23.02.2...

root@helios64:~# uname -a
Linux helios64 5.15.93-rockchip64 #23.02.2 SMP PREEMPT Fri Feb 17 23:48:36 UTC 2023 aarch64 GNU/Linux

root@helios64:~# apt update && apt list --upgradable
Hit:1 http://deb.debian.org/debian bullseye InRelease
Hit:2 http://deb.debian.org/debian bullseye-updates InRelease
Hit:3 http://deb.debian.org/debian bullseye-backports InRelease
Hit:4 https://download.docker.com/linux/debian bullseye InRelease
Hit:5 http://armbian.hosthatch.com/apt bullseye InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.
Listing... Done
armbian-bsp-cli-helios64/bullseye 23.11.1 arm64 [upgradable from: 21.08.2]
linux-dtb-current-rockchip64/bullseye 23.11.1 arm64 [upgradable from: 23.02.2]
linux-image-current-rockchip64/bullseye 23.11.1 arm64 [upgradable from: 23.02.2]

 

edit:

I found something about this with the command "apt-mark hold".
If I compare my 2 Helios64 installations with "apt-mark showhold", the following was blocked on one of them via armbian-config...

root@helios64:~# apt-mark showhold
armbian-bsp-cli-helios64
armbian-firmware
linux-dtb-current-rockchip64
linux-image-current-rockchip64

and on the 2nd one...

root@helios64:~# apt-mark showhold
armbian-bsp-cli-helios64
armbian-firmware

 

why wasn't the linux* updates also marked in the 2nd one?

Edited by TDCroPower
Posted

@TDCroPower I use "apt-mark hold <package>" and then on "apt upgrade" I get the package marked as hold told:

The following packages have been kept back:
  linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64 linux-image-edge-rockchip64
The following packages will be upgraded:
  openmediavault-compose

still "apt list --upgradable" tells the on-hold packages as upgradable.

I mean "apt-mark showhold" is the way to tell which packages will be kept on upgrade.

 

For the second question, why on your second box you do not have linux-dtb-* and linux-image-* on hold, I can only guess from the armbian-config code (ie /usr/lib/armbian-config/jobs.sh and search for "Freeze") that armbian-config only Freeze the installed packages with the name including the BRANCH value from "grep BRANCH /etc/armbian-release".

I believe if you have linux-dtb-edge-rockchip64 installed with BRANCH=current in /etc/armboan-release then armbian-config will fail to mark the package with "edge" instead of "current" in its name as on hold.

Posted

@ebin-dev @OdyX note that the instability I noticed is related to the big CPU cluster (the a72) and that the SATA and XHCI (usb3 including r8152) are bound to these a72 big cpus in /usr/lib/armbian/armbian-hardware-optimization (though I do not understand this code yet):

                        case ${BOARD_NAME} in
                                "Helios64")
                                        for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
                                                echo 10 > /proc/irq/$i/smp_affinity
                                        done
                                        for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do
                                                echo 30 > /proc/irq/$i/smp_affinity
                                        done
                                        ;;

 

I believe raid resync or rebuild triggers this instability via the SATA activity (even at boot).

(ie I kept my HDD setup from the Helios4 install instructions, ie raid10)

 

So not using RAID could help keep the board stable, even with a heavy load.

Posted (edited)
20 hours ago, prahal said:

So not using RAID could help keep the board stable, even with a heavy load.

 

The r8152 driver is using xhci-hcd. The above code assigns cpu4 to those processes (echo 10 ...).

I tried to assign cpu4 and cpu5 to xhci processes (echo 30) and to assign only cpu5 to ahci (sata) (echo 20), but nic offloading turned out to be more beneficial.

 

P.S.: I am not using RAID to keep it simple (but I maintain at least 4 separate backups). The file system itself also has some impact ext4 -> btrfs -> zfs.

Edited by ebin-dev
Posted (edited)

Would it make sense to let the kernel pick where it does what? That seems like a weird optimization to have.

 

(I'm using ext4 on cryptsetup on raid5, on 5 spinny 3,5" disks)

Edited by OdyX
Posted
16 hours ago, OdyX said:

Would it make sense to let the kernel pick where it does what?

 

The purpose of the 'armbian-hardware-optimization' is to bind the heavy tasks to one or both big core(s), since the little ones would not be capable of dealing with them. For some heavy tasks the kernel may need more than one big core. So I think this would make sense.

 

Cryptosetup on a raid 5 with 5 spinning disks is definitely a huge load. 

Dealing with the ethernet traffic on the 2.5G interface is also a heavy load: 280Mbyts/s have to be handled in pieces of 1500bytes (MTU) - simultaneously in both directions. 

 

So it might make some sense if you modify the optimization settings depending on your use case.

Posted

@ebin-dev@OdyX I found that the Helios64-specific code in /usr/lib/armbian/armbian-hardware-optimization is not run.

This helios64-specific code is under the rk3399 BOARDFAMILY section, while for armbian-bsp-cli-helios64-current 23.11.1 (bookworm) which ships /etc/armbian-release BOARDFAMILY is rockchip64 ....

 

I found out by checking /proc/interrupts and seeing ahci spread across little and big cores and xhci only on little cores.

 

You might also notice that armbian-hardware-optimization sets settings on eth0 while we only have end0 and eth1.

 

Posted (edited)
2 hours ago, prahal said:

I found that the Helios64-specific code in /usr/lib/armbian/armbian-hardware-optimization is not run.

 

That is true - in that script eth0 should be replaced by end0 if you wish to apply the Armbian optimisation to the 1G interface.

 

The interrupt for xhci-hcd:usb1 devices is 46 and smb affinity is set to 10 (cpu4) on my system (used by the 2.5G interface):

# cat /proc/irq/46/smp_affinity
10

# cat /proc/interrupts | grep xhci
 46:       1481          0        261          0     998993          0     GICv3 142 Level     xhci-hcd:usb1
 93:          0          0          0          0          0          0     GICv3 137 Level     dwc3-otg, xhci-hcd:usb5
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5  

 

Edited by ebin-dev
Posted (edited)

@TDCroPower  the kernel packages are armbian BRANCH based (current, edge, etc). So not bound to the OS release ("current" will always be the latest armbian stable release), but armbian channel bound.

You can have multiple branches installed (in this case if the current is updated after edge kernel debs current will be the new default kernel loaded at boot even if edge is newer, I believe we are not supposed to install both edge and current but nothing prevents us from doing so).

You can tell your installed kernel packages with:

dpkg -l "linux-*-rockchip64" | grep "^.i"

here I have:

hi  linux-dtb-edge-rockchip64      24.2.0-trunk arm64        Armbian Linux edge DTBs in /boot/dtb-6.6.11-edge-rockchip64
hi  linux-headers-edge-rockchip64  24.2.0-trunk arm64        Armbian Linux edge headers 6.6.11-edge-rockchip64
hi  linux-image-edge-rockchip64    24.2.0-trunk arm64        Armbian Linux edge kernel image 6.6.11-edge-rockchip64

so it means I only have edge branch kernel installed.

 

So to put them on hold (they already are thus the leading "hi", for on hold and installed), I do:

sudo apt-mark hold linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64  linux-image-edge-rockchip64

 

If your installed Linux kernel  packages are named linux-*-current-rockchip64 you instead do:

sudo apt-mark hold linux-dtb-current-rockchip64 linux-headers-current-rockchip64  linux-image-current-rockchip64

 

Mind I replaced armbian-bsp-cli-helios64-current with armbian-bsp-cli-helios64-edge and noticed that the edge version have BRANCH defined as "current" in the /etc/armbian-release that is shipped with the edge version. You can check the version of your armbian-bsp-cli package with:

dpkg -l "armbian-bsp-cli-helios64-*"

 

I believe the fact the "edge" version does define BRANCH="current" in /etc/armbian-release shipped by this package to be a bug and would explain why armbian-config would fail to freeze your kernel.

Edited by prahal
fix incorrect find about the /etc/armbian-release BRANCH value
Posted (edited)

@ebin-dev what I meant was that the Helios64 specific code is in the rk3399 section. But I believe nowadays (you can check in /etc/armbian-release the value of BOARDFAMILY) this rk3399 is not read on a helios64 install. The section "rockchip64" above the "rk3399" one is instead (line 203 of /usr/lib/armbian/armbian-hardware-optimization for the armbian-bsp-cli-helios64-edge package).

You can check the logs of the systemd service:

systemctl status armbian-hardware-optimize.service

 

Maybe you already tweak the end0 in the rockchip64 section though you could then see that the Helios64 BOARD_NAME section is under rk3399; not rockchip64.

 

@ebin-devYou seem to have the Helios64 code that is applied:

Can you give your /etc/armbian-release content? Especially the BOARDFAMILY value?

And "dpkg -l armbian-bsp-cli-helios64-*" output?

Edited by prahal
ask for more details
Posted (edited)

@prahal So I used the opportunity to fix the settings for the 1G interface in armbian-hardware-optimization:

My /etc/armbian-release states BOARDFAMILY=rk3399, Branch=current and I therefore edited the lines starting at line 251 - and that was successful.

But I do not know if there is any positive effect on the 1G interface, since I am using 2.5G only.

(I will go through the settings in /sys/class/net/end1 and see if something can be tuned for the 2.5G interface.)

 

# cat  /etc/armbian-release
BOARD=helios64
BOARD_NAME="Helios64"
BOARDFAMILY=rk3399
...
BRANCH=current

#cat /usr/lib/armbian/armbian-hardware-optimization
lines 251 ff:
                        echo 8 > /proc/irq/$(awk -F":" "/end0/ {print \$1}" < /proc/interrupts | sed 's/\ //g')/smp_affinity
                        echo 7 > /sys/class/net/end0/queues/rx-0/rps_cpus
                        echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
                        echo 32768 > /sys/class/net/end0/queues/rx-0/rps_flow_cnt
                          
# systemctl status armbian-hardware-optimize.service
● armbian-hardware-optimize.service - Armbian hardware optimization
     Loaded: loaded (/lib/systemd/system/armbian-hardware-optimize.service; enabled; preset: enabled)
     Active: active (exited) since Fri 2024-01-12 15:49:01 CET; 33min ago
    Process: 700 ExecStart=/usr/lib/armbian/armbian-hardware-optimization start (code=exited, status=0/SUCCESS)
   Main PID: 700 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 4374)
     Memory: 2.1M
        CPU: 795ms
     CGroup: /system.slice/armbian-hardware-optimize.service

 

I do not know if the following information is correct:

# dpkg -l armbian-bsp-cli-helios64-*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                             Version                                   Architecture Description
+++-================================-=========================================-============-=====================================================
hi  armbian-bsp-cli-helios64-current 23.08.0-trunk--1-PCc73b-Vccab-H1d04-B9c45 arm64        Armbian CLI BSP for board 'helios64' branch 'current'

 

Edited by ebin-dev
Posted (edited)

Just to let you know: the directory 'beta/pool/main/l/linux-6.6.8/' has vanished over night (actually the entire folder fi.mirror.armbian.de/beta/pool/main). This was the location we used for downloading linux 6.6.x (6.1.y etc.)  (kernel, dtb and headers). My link to the linux 6.6.8 files in this forum (as downloaded on 23.12.2023) remains active - also added links to linux 6.1.71 and 5.15.93 (dropbox).

Edited by ebin-dev
Posted
6 hours ago, ebin-dev said:

the directory 'beta/pool/main/l/linux-6.6.8/' has vanished over night

 

Expected behavior (unless process breaks in the middle and files are not there for some other reason) - we only provide last kernels. Easier way to change - providing more kernels at repository - have great impact on build time, while doing it smarter would require a person to focus only on this part of infrastructure for some time ... alternatively, we could drop all deb packages to some location, but again someone (not me) would need to code and maintain this automation.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines