Jump to content

ShadowDance

Members
  • Posts

    64
  • Joined

Everything posted by ShadowDance

  1. I'd say it's safe to ignore and leave it enabled. In fact, for now I'd even advice to do exactly that. I've seen some rare errors during boot that could potentially be related to setting nohz=off (the result after these errors is an unresponsive system): [ 161.210271] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 161.210847] rcu: 1-...!: (0 ticks this GP) idle=91a/1/0x4000000000000000 softirq=8639/8639 fqs=0 [ 161.211649] rcu: 4-...!: (1 GPs behind) idle=c0e/1/0x4000000000000000 softirq=12686/12687 fqs=0 [ 161.212443] rcu: 5-...!: (1 GPs behind) idle=5b2/1/0x4000000000000000 softirq=9198/9200 fqs=0 [ 161.213217] (detected by 2, t=15002 jiffies, g=13973, q=318) I've written about nohz previously in (before being aware of the above):
  2. That's great to hear gprovost! I didn't save any of the traces but I'm also having panics on 5.10.16, it always happens within a minute of booting it so I'm back on 5.9.14 which has been more stable (only panics once or twice a month).
  3. Native ZFS encryption speed is not optimal on ARM and is limited by CPU speed on the Helios64. The optimizations that have gone in are limited to amd64 based architectures and require CPU features not available on ARM. Another consideration is that because of the lack of CPU features, the CPU will be heavily loaded during encrypted reads and writes meaning there are less resources available for other tasks. The problem isn't AES though, which is fully supported by the RK3309, it's GCM. This means that you can do full disk encryption via LUKS and run ZFS without encryption on top -- this is what I do. It's the best we can have at the moment and for the foreseeable future, nobody to my knowledge is working on ARM encryption optimizations for ZFS currently. Edit: This may be of interest: https://github.com/openzfs/zfs/issues/10347
  4. Ok, good to know. And just to be clear: No bad sectors here either. However, the resets done by the system put the drives in a weird state (speculating) which behaved like bad sectors (but weren't, hence the quotes), a reboot would cleared those up. UDMA CRC errors happen on my drives too (never happened before putting them in the Helios64).
  5. Was this a replacement harness from Kobol or 3rd party SATA cables? I saw no improvement with a replcement harness but 3rd party SATA cables seems to have done the trick. Either way, thanks for sharing your insights @griefman, it’s nice to see others digging into this/these issue(s) as well. Another interesting observation I made along the way is that a reboot helps “fix” the read errors without writing to the “bad sectors” (not actually bad sectors!). I.e. if 6 Gbps triggered an error, limiting to 3 Gbps and rebooting would “fix it”, so the errors seem to stem, in my case, from the system resetting the drive, which in turn seems to happen because communication errors happen with stock SATA cables.
  6. @gprovost I can only speak for my part, but I’ve pretty much gone trough all libata options and combinations thereof (NCQ included), no improvement on my situation.
  7. Hey, sorry I haven't updated this thread until now. The Kobol team sent me, as promised, a new harness and a power-only harness so that I could do some testing: Cutting off capacitors from the my original harness did not make a difference The new (normal) harness had the exact same issue as the original one With the power-only harness and my own SATA cables, I was unable to reproduce the issue (even at 6 Gbps) Final test was to go to town on my original harness and cut the connector in two, this allowed me to use my own SATA cable with the original harness and there was, again, no issue (at 6 Gbps) Judging from my initial results, it would seem that there is an issue with the SATA cables in the stock harness. But I should try to do this for a longer period of time -- problem was I didn't have SATA cables for all disks, once I do I'll try to do a week long stress test. I reported my result to the Kobol team but haven't heard back yet. Even with the 3.0 Gbps limit, I still occasionally run into this issue with the original harness, has happened 2 times since I did the experiment. If someone else is willing to repeat this experiment with a good set of SATA cables, please do contact Kobol to see if they'd be willing to ship out another set of test harnesses, or perhaps they have other plans. Here's some pics of my test setup, including the mutilated connector:
  8. I had this for the second time now in a week, captured the kernel panic this time via serial so thought I'd post it. (Note I only have governor set to performance, but have not limited CPU speed to lower.) [895638.308515] Unable to handle kernel paging request at virtual address fff70000f77d1ab8 [895638.309217] Mem abort info: [895638.309470] ESR = 0x96000004 [895638.309745] EC = 0x25: DABT (current EL), IL = 32 bits [895638.310216] SET = 0, FnV = 0 [895638.310491] EA = 0, S1PTW = 0 [895638.310773] Data abort info: [895638.311033] ISV = 0, ISS = 0x00000004 [895638.311375] CM = 0, WnR = 0 [895638.311642] [fff70000f77d1ab8] address between user and kernel address ranges [895638.312271] Internal error: Oops: 96000004 [#1] PREEMPT SMP [895638.312765] Modules linked in: tcp_diag inet_diag veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ ipv4 nf_tables nfnetlink br_netfilter bridge governor_performance rfkill ledtrig_netdev snd_soc_hdmi_codec hantro_vpu(C) rockchip_vdec(C) rockchipdrm dw_mipi_dsi v4l2_h264 rockchip_rga dw_hdmi videobuf2_dma_sg videobuf2_dma_contig snd_soc _rockchip_i2s analogix_dp videobuf2_vmalloc drm_kms_helper v4l2_mem2mem videobuf2_memops snd_soc_core videobuf2_v4l2 r8152 videobuf2_common videodev cec snd_pcm_dmaengine panfrost snd_pcm rc_core gpu_sched fusb302 drm mc snd_timer leds_pw m snd tcpm typec gpio_charger soundcore sg drm_panel_orientation_quirks gpio_beeper cpufreq_dt nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm75 ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) zcommon(POE) zn vpair(POE) spl(OE) algif_skcipher af_alg dm_crypt dm_mod [895638.312835] realtek dwmac_rk stmmac_platform stmmac mdio_xpcs adc_keys pwm_fan [895638.321067] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P C OE 5.9.14-rockchip64 #20.11.4 [895638.321837] Hardware name: Helios64 (DT) [895638.322190] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--) [895638.322690] pc : perf_event_task_tick+0x8c/0x330 [895638.323101] lr : perf_event_task_tick+0x84/0x330 [895638.323511] sp : ffff800011b03d30 [895638.323809] x29: ffff800011b03d30 x28: 00000000003d0900 [895638.324281] x27: 0000000000000000 x26: ffff0000f77c4aa0 [895638.324753] x25: ffff800011839988 x24: ffff80001150f018 [895638.325224] x23: ffff0000f6eae580 x22: ffff800011524c40 [895638.325695] x21: ffff0000f77d1ab8 x20: ffff800011521aa0 [895638.326166] x19: ffff800011521a00 x18: 0000000000000000 [895638.326637] x17: 0000000000000000 x16: 0000000000000000 [895638.327108] x15: 0000000000000000 x14: 0000000000000000 [895638.327579] x13: 0000000000000000 x12: 000000000000026b [895638.328050] x11: 0000000000000000 x10: 0000000000000000 [895638.328521] x9 : 0000000000000004 x8 : 000000000000026b [895638.328993] x7 : 0000000000000000 x6 : ffff0000f77c8740 [895638.329464] x5 : 000000000000126b x4 : ffff8000e62a3000 [895638.329935] x3 : 0000000000010001 x2 : ffff80001150f018 [895638.330406] x1 : ffff80001122bf80 x0 : 0000000000000005 [895638.330876] Call trace: [895638.331099] perf_event_task_tick+0x8c/0x330 [895638.331482] scheduler_tick+0xc4/0x140 [895638.331821] update_process_times+0x5c/0x70 [895638.332195] tick_sched_handle.isra.19+0x40/0x58 [895638.332605] tick_sched_timer+0x58/0xb0 [895638.332948] __hrtimer_run_queues+0x104/0x3b0 [895638.333337] hrtimer_interrupt+0xf4/0x250 [895638.333697] arch_timer_handler_phys+0x30/0x40 [895638.334093] handle_percpu_devid_irq+0xa0/0x2b8 [895638.334497] generic_handle_irq+0x30/0x48 [895638.334855] __handle_domain_irq+0x94/0x108 [895638.335230] gic_handle_irq+0xc8/0x170 [895638.335566] el1_irq+0xb8/0x180 [895638.335849] arch_cpu_idle+0x14/0x20 [895638.336171] do_idle+0x1fc/0x270 [895638.336462] cpu_startup_entry+0x24/0x60 [895638.336815] secondary_start_kernel+0x16c/0x180 [895638.337221] Code: b821681f 94303ac2 f8756a95 eb15035f (f85d06b6) [895638.337767] ---[ end trace f36c5bd12d8d7b39 ]--- [895638.338178] Kernel panic - not syncing: Fatal exception in interrupt [895638.338741] SMP: stopping secondary CPUs [895638.339095] Kernel Offset: disabled [895638.339408] CPU features: 0x0240022,2000200c [895638.339789] Memory Limit: none [895638.340071] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
  9. @RockBian my suggestion could fix your issue since it gets rid of the gzip compression. The caveat, of course, being that the files will no longer be compressed. It might be possible to edit 02-armbian-compress-indexes and changing the compression to lz4 (if that works) but I haven't tested it.
  10. It's probably indicating a kernel panic. You could try limiting CPU speed and setting the governor as suggested in the linked topic. You could also try hooking up the serial port to another machine and monitor it for any output indicating why the crash happened.
  11. If you want to speed it up, you can quick-fix it by doing the following: sudo rm /etc/apt/apt.conf.d/02-armbian-compress-indexes sudo rm -rf /var/lib/apt/lists/* sudo apt update If you don't want to remove 02-armbian-compress-indexes you can edit it and change true to false. Do note that these will, unfortunately, be overwritten/replaced by future updates to the linux-buster-root-current-helios64 package.
  12. @SR-G have you tried limiting min/max CPU speed as well as setting the governor as suggested in https://forum.armbian.com/topic/16664-system-crashes-after-sync-command/?tab=comments#comment-116681 ? I had good luck with only setting governor to performance for a long time, but recently had a system lockup as well. Unfortunately I didn't have serial hooked up at the time so I didn't get any reports. Will keep monitoring (with serial hooked up).
  13. Hmm, I think one reason you’re not seeing it now could also be that disk access patterns changed by removing the encryption. For me the issue is easily reproducible even without encryption. Simply do a read via dd on a disk. For me speed of reproduction varies depending in which disk slot is read from. Here’s an example command: You may want to try this on each disk just to confirm the issue isn’t still lurking around. Good luck!
  14. This looks like the same issue as I’ve been having, it’s possibly due to the SATA cables in the harness. I’ve tested with other SATA cables and can’t reproduce the issue with those. So I believe it’s not likely to be bad disks but that is a possibility of course. You can start by trying to limit SATA speed to 3 Gbps and see if that helps, if it does it’s very likely the same issue. I also recommend doing SMART scans and scrub _after_ limiting the SATA speed. See my other topic for more information: Good luck!
  15. You probably have the answer you're looking for already, but yes, they do stop :).
  16. For spinning disks, you should use ashift 12 (and ashift 13 for SSDs) when creating your ZFS pool. It can't be changed after the fact and should match the physical block size of your HDD (12 = 4096). ZFS read speed also benefits from additional disks, if you e.g. created a mirror across two disks or say a RAIDZ1 across 5 disks, you should see pretty good performance. Also, to make the above test unfair (in favor of ZFS), you could also enable compression (-O compression=lz4). zpool create -o ashift=12 ... Personally I use: zpool create \ -o ashift=12 \ -O acltype=posixacl -O canmount=off -O compression=lz4 \ -O dnodesize=auto -O normalization=formD -O relatime=on \ -O xattr=sa \ ...
  17. @dancgn if it crashes occasionally, you could try to use armbian-config to change the CPU speed / governor to 'performance' (armbian-config > System > CPU). I personally use 1800000 for CPU speed but I've seen 1200000 recommended elsewhere on the forum, YMMV.
  18. If you're seeing a few of those every now and then, nothing to be alarmed about. It can happen when the CPU is stressed and is normal. If you're constantly seeing it then you might want to consider either reducing the CPU load or disabling NOHZ. For understanding what NOHZ does, I highly recommend this Stack Overflow answer to: "How NOHZ=ON affects do_timer() in Linux kernel?". If you want to get rid of the message, you can simply add the following to /boot/armbianEnv.txt: extraargs=nohz=off But if you decide to do this, make sure you fully understand what it does by reading the aforementioned SO article and perhaps https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt.
  19. @grek I've been using compression since the get-go, and can't say I've had similar issues so far. The only time I had a kernel panic was from before I switched to the performance governor for the CPU. LZ4 is also a pretty light-weight compression algorithm, but it does increase CPU to a small degree. Potential changes I have compared to your setup: I run root (/) on ZFS I use full disk encryption via LUKS (that's an additional layer between ZFS and kernel) I use the performance governor My disks are limited to 3.0 Gbps SATA speed Things you could try: Upgrade to the latest 5.9.14-rockchip64 kernel Upgrade to the latest ZFS 0.8.6 kernel modules Try the performance governor Do you use a swap partition? If yes, you could try to swapoff before repeating the process that paniced the kernel Wildcard idea: You could also try disabling the Armbian zram config in /etc/default/armbian-zram-config (I have it disabled)
  20. @manman hmm, the configure step should’ve taken care of that. You could try to specify the correct path manually via: If that still doesn’t help, see if you have any other linux-headers installed and remove them all.
  21. @Salamandar Yes, I've been using 5.8/5.9 since the get-go. You probably need the `linux-headers-legacy-rk3399` package (modify the apt-get download stage). Good job automating the process .
  22. I've also noted the sudden power loss to drives and that it only happens during reboot, not regular shut down. When doing a regular shut down the drives seem to shut down one at a time, and when reboot:ing they lose power simultaneously. Now, I'm wondering, what happens if the system is running from the drives and we're using a systemd script to stop them? Is there a chance that they will spin up again because another script is loaded or something similar? It may be a stupid question but I'm not very familiar with systemd. I imagine it should be the very last thing that's executed before the actual restart. Edit: Side-question, is it actually necessary to power cycle the drives during reboot? Or could they be left running? Might that require changes to u-boot perhaps?
  23. If you want the fans of your Helios64 to start spinning earlier, look no further. This will allow the fans to spin at a constant speed from the earliest stage of initrd until fancontrol is started by the system. My use-case is for full disk encryption and recovery in initramfs, sometimes I can have the machine powered on for quite a while before fancontrol starts. Or, if the boot process encounters an error the fans may never start, this prevents that. Step 1: Tell initramfs to include the pwm-fan module echo pwm-fan | sudo tee -a /etc/initramfs-tools/modules Step 2: Create a new file at /etc/initramfs-tools/scripts/init-top/fan with the following contents: #!/bin/sh PREREQ="" prereqs() { echo "$PREREQ" } case $1 in prereqs) prereqs exit 0 ;; esac . /scripts/functions modprobe pwm-fan for pwm in /sys/devices/platform/p*-fan/hwmon/hwmon*/pwm1; do echo 150 >$pwm done exit 0 Feel free to change the value 150 anywhere between 1-255, it decides how fast the fans will spin. Step 3: Enable the fan script for inclusion in initramfs and update the initramfs image: sudo chmod +x /etc/initramfs-tools/scripts/init-top/fan sudo update-initramfs -u -k all Step 4: Reboot and ejoy the cool breeze.
  24. This is what I do as well , although I had to create a dummy zfs-dkms package to prevent it from being pulled in by zfsutils-linux.
  25. Good initiative on this thread grek and nice to see a full set of instructions all the way from installing Docker ! I would also echo the zfs-dkms recommendation, when possible. But at least for Buster users we need a workaround. The zfs-dkms package can't be compiled (kernel is configured with features not supported by the old(er) gcc), also the zfs-dkms version in backports is 0.8.4, so unless patches have been backported from 0.8.5, kernel 5.8-5.9 isn't supported.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines