Jump to content

jbergler

Members
  • Posts

    26
  • Joined

  • Last visited

Everything posted by jbergler

  1. I pulled that from the postinst of the previous debian package `ar x whatever.deb` and then untar the control.tar.gz file to get it. Since you're booting without issues, I wonder if your boot.ini has some other config than me? I have this part that feels relevant # legacy and mainline kernel diff if ext4load mmc ${devno}:1 0x00000000 "/boot/.next" || fatload mmc ${devno}:1 0x00000000 ".next" || ext4load mmc ${devno}:1 0x00000000 ".next"; then echo "Found mainline kernel configuration" setenv uartconsole "ttyAML0,115200n8" setenv kernelimage "uImage" else echo "Found legacy kernel configuration" setenv uartconsole "ttyS0,115200n8" setenv kernelimage "zImage" fi
  2. @hzyitc I wonder if this commit might be related: https://github.com/armbian/build/commit/76ce4c3a3ddb8f93686598808f6d1687232f9ddb removed patch/kernel/archive/meson-6.1/generate-uImage-instand-of-zImage.patch
  3. Can confirm I'm also seeing this. I'm able to recover by installing the 23.02.2 linux-image-current-meson64 and linux-dtb-current-meson64 packages from /var/cache/apt/archives. From what I can tell it's installing the modules for 6.1.50 in /lib/modules but is somehow still booting 6.1.11 /boot/uImage isn't getting updated $ strings /boot/uImage | grep "Linux version" Linux version 6.1.11-meson64 (root@29682b33de96) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #23.02.2 SMP PREEMPT Linux version 6.1.11-meson64 (root@29682b33de96) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #23.02.2 SMP PREEMPT Sat Feb 18 00:07:55 UTC 2023 I can confirm that running the following command manually gets the device back into a working state $ mkimage -A arm64 -O linux -T kernel -C none -a 0x1080000 -e 0x1080000 -n Linux -d /boot/vmlinuz-6.1.50-current-meson64 /boot/uImage
  4. @aprayoga verbosity was already up, but I've added the other args. I'm not going to provoke the system since it's somewhat stable again and it's in use, but in terms of a repro here's the setup. 2x 8TB + 3x 12TB drives. tank0 5x8TB raidz1 tank1 3x4TB raidz1 (this tank isn't mounted currently) If I want to crash the box I can start a zfs scrub on tank0. After some time (<~6 hours) the box crashes. On boot, if a scrub was in progress, box won't finish booting.
  5. My system was stable for a long time (~3-4 weeks) and then the other day it soft locked with a panic (trace was in ZFS). Rest of the system was still vaguely usable, great - this has happened before I thought, so I rebooted and could not get it to finish booting. Every time, one of two things would happen as the zfs pool was mounted. 1) system would silently lock up, no red led, no panic on console, nothing 2) system would panic, red led started flashing. The only way I've been able to get the system to boot is by unplugging the disks, waiting for the system to boot and then plugging the disks back in and mounting them. Even then the system crashes again within a short period of time (maybe because the ZFS is trying to scrub following the crash) I've upgraded to 21.02.3 / 5.10.21 I never had the vdd tweaks applied, but I've tried both with and without them. I've explicitly run the boot-loader upload steps in armbian config (was Nov, now Mar 8) I'm relatively confident the issue I'm seeing relates to the others here, more often than not the panics are page faults (null pointer, address between kernel and user space, could not execute from non-execute memory) which seems plausible given the focus on voltage tuning. Any ideas? I can make an effort to collect boot logs if that's helpful, but given the frequency of these reports it seems like this is a relatively widespread issue.
  6. jbergler

    ZFS on Helios64

    The problem here is that it‘s not possible to compile the module on Debian because of how the kernel has been built. I reported the issue here and while I could *fix* it, it really strikes me as something the core armbian team needs to weigh in on. One option is to use an older GCC in the build system, the other is to disable per task stack protections in the kernel - neither seem like great choices to me.
  7. I do not unfortunately, but I haven't seen any errors in the lead up to the crashes I've experienced that look like problems with the drives (at least not from what I can tell)
  8. Box locked up overnight, nothing on the console.
  9. I cold booted the box, and now it seems to behave just fine. Will run some load testing overnight and report back.
  10. Initial attempt with the new uboot and with removing the cpufreq tweaks results in a new panic And trying again
  11. I'll defer to the Kobol folks, in the previous mega thread the statement was made that the issues should have been fixed in a new version that ensured it was correctly applying the hardware tweaks, for me things have never been properly stable, even on just a vanilla install. The only semi-stable solution has been to reduce the clock speed, which is fine for now.
  12. I had 1 more crash and another soft lockup, but otherwise the box is much more usable. @aprayoga Definitely still something not running right, even at the lower clock speeds. My limited knowledge suggests something memory related, but that's all I've got. If you'd like me to test anything else, let me know.
  13. After about an hour of the ZFS scrub the "bad PC value" error happened again, however this time the system didn't hard lock. A decent number of processes related to ZFS are stuck in uninterruptible IO, I can't export the pool, etc. I did see the system crash like this occasionally without the cpufreq tweaks, so I'm not sure it tells us anything new. I will try again. note, the relatively high uptime is from the system sitting idle for ~5 days before I put it under load again.
  14. Out of curiosity what is the (web?) interface in your screenshot.
  15. It's hard to say for sure, I never quite had a stable system, but I also wasn't generating the kind of load I am now back then. I had only reduced it one step, I'm trying again now with the settings you suggest. root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | uniq performance root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq | uniq 816000 root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq | uniq 1200000 The load I'm generating is running a zfs scrub on a 37TB pool across all five disks.
  16. Continuing the discussion from here On a clean install of 20.08.21 im able to crash the box within a few hours of it being under load. It appears as if the optimisations are being applied root@helios64:~# cat /proc/sys/net/core/rps_sock_flow_entries 32768 The suggestion @ShadowDance made to switch to the performance governor hasn't helped. Anecdotally, I think I remember the crashes always mentioning page faults, and early on there was some discussion about memory timing. Is it possible this continues to be that issue?
  17. root@helios64:~# cat /proc/sys/net/core/rps_sock_flow_entries 32768 I also tried the suggestion to set a performance governor, and for shits and giggles I reduced the max cpu frequency, but that hasn’t made a difference. System still locks up within a few hours. I did finally manage to get the serial console to print something meaningful during boot, and one thing that stands out is this Loading Environment from MMC... *** Warning - bad CRC, using default environment Full boot log is below Yes,
  18. I'm still seeing regular panics, to the point where the box won't stay up for more than a few hours. To ensure it was in a clean state, I re-installed 20.08.21 focal and only added samba + zfs back.
  19. Have there been any changes to how the serial console is initialised? I previously had no issues using the serial console to interact with uboot, but now the serial console doesn't seem to initialise until linux boots, and even then there's a bunch of gibberish until the tty does something. [ 326.906283] reboot: Restarting system 7�[�A5_U�zz�=E:�{�;:���/{��_�Z{�h^�;���[�xz��ap��E^�x^��[6zp��xz{[��[���>va[/{=A5_U�zz�=E:�s�?:���/{��_�[{�h^�;���Z[xz?z��ap��E^�x^��[6zp��xz{[��;'~E){=a[{=Armbian 20.08.21 Focal ttyS2 helios64 login:
  20. @aprayoga if you still need it here's a full boot log of the crash (the actual stacktrace of the crash is inconsistent for me)
  21. If you, like myself, installed on eMMC and are experiencing the crashes on 20.08.14 - I booted up via a 20.08.10 sdcard and fixed the environment on emmc # mount the emmc + get ready to chroot mkdir /mnt/chroot mount /dev/mmcblk1p2 /mnt/chroot/ mount /dev/mmcblk1p1 /mnt/chroot/media/mmcboot mount --bind /mnt/chroot/media/mmcboot/boot/ /mnt/chroot/boot/ mount --bind /dev /mnt/chroot/dev/ mount --bind /proc /mnt/chroot/proc/ mount --bind /tmp /mnt/chroot/tmp/ # chroot in and downgrade to 20.08.10 chroot /mnt/chroot/ /bin/bash apt install \ linux-dtb-current-rockchip64=20.08.10 \ linux-headers-current-rockchip64=20.08.10 \ linux-image-current-rockchip64=20.08.10 \ armbian-config=20.08.10 \ armbian-firmware=20.08.10 \ linux-focal-root-current-helios64=20.08.10 \ linux-u-boot-helios64-current=20.08.10 exit # now remove the sd card and hit reset @aprayoga It's probably unrelated, but while working through the above I noticed that I ran out of space on /boot. I installed to eMMC the first version that was working, if that helps. I chose f2fs when I installed on eMMAC and this is the resulting partition layout mmcblk1 179:32 0 14.6G 0 disk ├─mmcblk1p1 179:33 0 96M 0 part └─mmcblk1p2 179:34 0 14.3G 0 part mmcblk1boot0 179:64 0 4M 1 disk mmcblk1boot1 179:96 0 4M 1 disk Sadly I didn't grab enough info from what was in the boot partition before I nuked it and reinstalled the appropriate packages.
  22. These won't be exact instructions, since I decided to switch to focal (mostly for other reasons). mkdir zfs-scratch cd zfs-scratch apt-get download linux-headers-current-rockchip64 git clone -b zfs-0.8.5 https://github.com/openzfs/zfs.git docker run --rm -it -v $(pwd):/scratch ubuntu:focal # inside the container cd /scratch apt update apt install build-essential autoconf automake bison flex libtool gawk alien fakeroot dkms libblkid-dev uuid-dev libudev-dev libssl-dev zlib1g-dev libaio-dev libattr1-dev libelf-dev python3 python3-dev python3-setuptools python3-cffi libffi-dev dpkg -i linux-headers-current-*.deb cd zfs sh autogen.sh ./configure make -s -j$(nproc) deb At that point you can exit the container (it'l vanish because of the --rm) and inside zfs-scratch/zfs you should have a bunch of Debian packages you can install
  23. In the meantime, I built a zfs 0.8.5 module that you can use on buster (but only the 5.8.13 kernel) First install this package with dpkg -i kmod-zfs-5.8.13-rockchip64_0.8.5-1_arm64.deb and then install the zfs utils with apt install -t buster-backports zfsutils-linux
  24. @Brocklobsta sounds like you're setting GCC correctly, and the issue thats left is probably that the zfs-dkms module is too old (try finding 0.8.4+ per here) For the issue on buster, I think I've nailed the problem down. STACKPROTECTOR_PER_TASK is defined as such config CC_HAVE_STACKPROTECTOR_SYSREG def_bool $(cc-option,-mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0) config STACKPROTECTOR_PER_TASK def_bool y depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_SYSREG GCC in the focal build environment supports the options required for CC_HAVE_STACKPROTECTOR_SYSREG $ gcc --version gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 $ gcc -mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0 gcc: fatal error: no input files compilation terminated. Whereas the gcc version buster doesn't support these. $ gcc --version gcc (Debian 8.3.0-6) 8.3.0 $ gcc -mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0 gcc: error: unrecognized command line option '-mstack-protector-guard=sysreg'; did you mean '-fstack-protector-strong'? gcc: error: unrecognized command line option '-mstack-protector-guard-reg=sp_el0'; did you mean '-fstack-protector-all'? gcc: error: unrecognized command line option '-mstack-protector-guard-offset=0'; did you mean '-fstack-protector-strong'? gcc: fatal error: no input files compilation terminated. Since the kernel is built for both focal + buster, should features like this be disabled or are there some other workarounds?
  25. The suggestion Igor made in that thread solves the 0.8.3 problems on 5.8 because certain kernel methods were flagged as GPL and the module won't compile. As others in that thread pointed out, using the newer 0.8.4 package works on focal but not buster. 0.8.4 compiles correctly against the rockchip64 kernel on focal (gcc9) and bullseye (gcc10), but not with buster (gcc8). 2.0.0-rc follows the same pattern. the rockchip64 kernel is built using the armbian toolchain, which is focal based and thus using gcc 9. I haven't been able to come up with any other theory than how the compiler is detecting if it should use CONFIG_STACKPROTECTOR or CONFIG_STACKPROTECTOR_PER_TASK. It's only a theory, and I don't know how to prove or disprove it, short of building the same 5.8.13 kernel with gcc 8 and trying to build the module against it (which is probably my next step)
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines