Jump to content

jbergler

Members
  • Posts

    26
  • Joined

  • Last visited

Recent Profile Visitors

1718 profile views
  1. I pulled that from the postinst of the previous debian package `ar x whatever.deb` and then untar the control.tar.gz file to get it. Since you're booting without issues, I wonder if your boot.ini has some other config than me? I have this part that feels relevant # legacy and mainline kernel diff if ext4load mmc ${devno}:1 0x00000000 "/boot/.next" || fatload mmc ${devno}:1 0x00000000 ".next" || ext4load mmc ${devno}:1 0x00000000 ".next"; then echo "Found mainline kernel configuration" setenv uartconsole "ttyAML0,115200n8" setenv kernelimage "uImage" else echo "Found legacy kernel configuration" setenv uartconsole "ttyS0,115200n8" setenv kernelimage "zImage" fi
  2. @hzyitc I wonder if this commit might be related: https://github.com/armbian/build/commit/76ce4c3a3ddb8f93686598808f6d1687232f9ddb removed patch/kernel/archive/meson-6.1/generate-uImage-instand-of-zImage.patch
  3. Can confirm I'm also seeing this. I'm able to recover by installing the 23.02.2 linux-image-current-meson64 and linux-dtb-current-meson64 packages from /var/cache/apt/archives. From what I can tell it's installing the modules for 6.1.50 in /lib/modules but is somehow still booting 6.1.11 /boot/uImage isn't getting updated $ strings /boot/uImage | grep "Linux version" Linux version 6.1.11-meson64 (root@29682b33de96) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #23.02.2 SMP PREEMPT Linux version 6.1.11-meson64 (root@29682b33de96) (aarch64-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0, GNU ld (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 2.32.0.20190321) #23.02.2 SMP PREEMPT Sat Feb 18 00:07:55 UTC 2023 I can confirm that running the following command manually gets the device back into a working state $ mkimage -A arm64 -O linux -T kernel -C none -a 0x1080000 -e 0x1080000 -n Linux -d /boot/vmlinuz-6.1.50-current-meson64 /boot/uImage
  4. @aprayoga verbosity was already up, but I've added the other args. I'm not going to provoke the system since it's somewhat stable again and it's in use, but in terms of a repro here's the setup. 2x 8TB + 3x 12TB drives. tank0 5x8TB raidz1 tank1 3x4TB raidz1 (this tank isn't mounted currently) If I want to crash the box I can start a zfs scrub on tank0. After some time (<~6 hours) the box crashes. On boot, if a scrub was in progress, box won't finish booting.
  5. My system was stable for a long time (~3-4 weeks) and then the other day it soft locked with a panic (trace was in ZFS). Rest of the system was still vaguely usable, great - this has happened before I thought, so I rebooted and could not get it to finish booting. Every time, one of two things would happen as the zfs pool was mounted. 1) system would silently lock up, no red led, no panic on console, nothing 2) system would panic, red led started flashing. The only way I've been able to get the system to boot is by unplugging the disks, waiting for the system to boot and then plugging the disks back in and mounting them. Even then the system crashes again within a short period of time (maybe because the ZFS is trying to scrub following the crash) I've upgraded to 21.02.3 / 5.10.21 I never had the vdd tweaks applied, but I've tried both with and without them. I've explicitly run the boot-loader upload steps in armbian config (was Nov, now Mar 8) I'm relatively confident the issue I'm seeing relates to the others here, more often than not the panics are page faults (null pointer, address between kernel and user space, could not execute from non-execute memory) which seems plausible given the focus on voltage tuning. Any ideas? I can make an effort to collect boot logs if that's helpful, but given the frequency of these reports it seems like this is a relatively widespread issue.
  6. jbergler

    ZFS on Helios64

    The problem here is that it‘s not possible to compile the module on Debian because of how the kernel has been built. I reported the issue here and while I could *fix* it, it really strikes me as something the core armbian team needs to weigh in on. One option is to use an older GCC in the build system, the other is to disable per task stack protections in the kernel - neither seem like great choices to me.
  7. I do not unfortunately, but I haven't seen any errors in the lead up to the crashes I've experienced that look like problems with the drives (at least not from what I can tell)
  8. Box locked up overnight, nothing on the console.
  9. I cold booted the box, and now it seems to behave just fine. Will run some load testing overnight and report back.
  10. Initial attempt with the new uboot and with removing the cpufreq tweaks results in a new panic And trying again
  11. I'll defer to the Kobol folks, in the previous mega thread the statement was made that the issues should have been fixed in a new version that ensured it was correctly applying the hardware tweaks, for me things have never been properly stable, even on just a vanilla install. The only semi-stable solution has been to reduce the clock speed, which is fine for now.
  12. I had 1 more crash and another soft lockup, but otherwise the box is much more usable. @aprayoga Definitely still something not running right, even at the lower clock speeds. My limited knowledge suggests something memory related, but that's all I've got. If you'd like me to test anything else, let me know.
  13. After about an hour of the ZFS scrub the "bad PC value" error happened again, however this time the system didn't hard lock. A decent number of processes related to ZFS are stuck in uninterruptible IO, I can't export the pool, etc. I did see the system crash like this occasionally without the cpufreq tweaks, so I'm not sure it tells us anything new. I will try again. note, the relatively high uptime is from the system sitting idle for ~5 days before I put it under load again.
  14. Out of curiosity what is the (web?) interface in your screenshot.
  15. It's hard to say for sure, I never quite had a stable system, but I also wasn't generating the kind of load I am now back then. I had only reduced it one step, I'm trying again now with the settings you suggest. root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | uniq performance root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq | uniq 816000 root@helios64:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq | uniq 1200000 The load I'm generating is running a zfs scrub on a 37TB pool across all five disks.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines