Reputation from gprovost - Page 2 - Armbian Community Forums

gprovost reacted to rupert in Does anyone actually have a stable system? April 16, 2021

HI,

My poor little Kobol has to work very hard for a living:

Plex
NTP (GPSD) server
SMB share
Webmin
Zabbix agent
Zerotier client/bridge
Docker
Storj node
Raid 5 across 5 disks

And is as good as gold!

Had a few niggles to start with ie first week, since then its only rebooted as the OS requests otherwise excellent!

Running on a 64Gb SD Card

CPU full speed

Very pleased!

Rup

gprovost reacted to dieKatze88 in Does anyone actually have a stable system? April 16, 2021

My system is still unstable and I have no idea what to do. I've given every suggestion I've seen on the internet a try and I think the only thing I can do is move on.

gprovost reacted to aprayoga in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED April 14, 2021

Hi all, you could download the SATA firmware update at https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_update_00021_200624.img.xz

Instruction:
1. Download the sd card image
2. Flash the image into a microSD card
3. Insert microSD card into Helios64 and power on. Helios64 should automatically boot from microSD. If Helios64 still boot from eMMC, disable the eMMC
4. Wait for a while, the system will do a reboot and then power off if firmware flashing succeed.
If failed, both System Status and System Fault LEDs will blink
5. Remove the microSD card and boot Helios64 normally. See if there is any improvement.

Our officially supported stock firmware can be downloaded from https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_factory_00020_190702.img.xz. If there is no improvement on newer firmware, please revert back to this stock firmware.

SHA256SUM:
e5dfbe84f4709a3e2138ffb620f0ee62ecbcc79a8f83692c1c1d7a4361f0d30f *Helios64_SATA_FW_factory_00020_190702.img.xz 0d78fec569dd699fd667acf59ba7b07c420a2865e1bcb8b85b26b61d404998c5 *Helios64_SATA_FW_update_00021_200624.img.xz

gprovost reacted to wurmfood in Migrate from ramlog to disk April 9, 2021

Well, for anyone else interested in trying this, here's the basic order I did:
stop armbian-ramlog disable armbian-ramlog create a zfs dataset and mount it at /var/log cp -ar everything from /var/log.hdd to the new /var/log modify /etc/logrotate to disable compression (since the dataset is already using compression) modify /etc/default/armbian-ramlog to disable it there as well modify /etc/default/armbian-zram-config to adjust for new numbers (I have ZRAM_PERCENTAGE and MEM_LIMIT_PERCENTAGE at 15). reboot

gprovost reacted to jpegxguy in Problem with apt list April 8, 2021

Just for anyone reading this, this might be related. Someone suggested prioritizing gzip in order to escape the slowness of LZ4 on ARM platforms.
Here is the solution, which was also merged in Armbian itself:

gprovost got a reaction from allen--smithee in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED April 5, 2021

@Fred Fettinger Your errors are most likely due too a not stable HDD harness. You can contact us to support@kobol.io to see if replacement of the HDD wire harness is needed.

@ShadowDance Thanks for the feedback and at least we can remove grounding issue from the list of possible root causes.
Looks like that in the use case of very heavy HDD I/O load arising mainly during scrubbing operation too much noise is generated resulting in those HSM violation.
Thanks to all your tests, the issue seem unfortunately to point to noise filtering issue. As previously mentioned, we will fix that in next revision.
We will see if the new SATA controller firmware has any impact but we doubt that. I think the only solution for current revision is to limit ATA speed to 3Gbps when using btrfs or zfs.

gprovost reacted to deanl3 in nand-sata-install to eMMC, won't start kernel April 5, 2021

Sorry to repeat another case of this. Have attempted a few times to move boot from SD card to eMMC before the SD card fails.

I did according to the wiki doc with nand-sata-install. Since I have OMV, I excluded the exports, and also afterward copied the salt and pillar. Also, in another attempt, on suggestion from another thread, I updated the boot loader and copied again.
I can boot to my SD card, mount the eMMC, and see the root structure.

From what I can tell in reading the boot output, it can't find any storage device?
4 USB Device(s) found
scanning usb for storage devices... 0 Storage Device(s) found
But I can see it when the OS is up, mount and read it, and nand-sata-install can write to it.

Here's the boot output:
DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 253 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 SdmmcInit=0 1 StorageInit ok = 67744 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xe5674 RunBL31 0x40000 �NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (Jan 05 2021 - 00:07:57 +0100) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 254 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 SdmmcInit=0 1 StorageInit ok = 67748 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xe5674 RunBL31 0x40000 �NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (Jan 05 2021 - 00:07:57 +0100) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... starting USB... Bus usb@fe380000: USB EHCI 1.00 Bus dwc3: usb maximum-speed not found Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe380000 for devices... 1 USB Device(s) found scanning bus dwc3 for devices... cannot reset port 4!? 4 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Hit any key to stop autoboot: 0 Card did not respond to voltage select! switch to partitions #0, OK mmc0(part 0) is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 3185 bytes read in 19 ms (163.1 KiB/s) ## Executing script at 00500000 Boot script loaded from mmc 0 359 bytes read in 15 ms (22.5 KiB/s) 16181146 bytes read in 1556 ms (9.9 MiB/s) 27507200 bytes read in 2631 ms (10 MiB/s) 81696 bytes read in 41 ms (1.9 MiB/s) Failed to load '/boot/dtb/rockchip/overlay/-fixup.scr' Moving Image from 0x2080000 to 0x2200000, end=3cd0000 ## Loading init Ramdisk from Legacy Image at 06000000 ... Image Name: uInitrd Image Type: AArch64 Linux RAMDisk Image (gzip compressed) Data Size: 16181082 Bytes = 15.4 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 01f00000 Booting using the fdt blob at 0x1f00000 Loading Ramdisk to f4f81000, end f5eef75a ... OK Loading Device Tree to 00000000f4f04000, end 00000000f4f80fff ... OK Starting kernel ...

gprovost got a reaction from hartraft in Image Backup/Restore from Boot(Emmc)+System(M.2-SSD) April 4, 2021

Yeah recovering would require you to boot the board from microSD card (using any recent image for Helios64). Then you will need to use the right tool (e.g dd or fsarchiver) to write your system backup on the M.2 SSD.

If you use eMMC only for u-boot then unlikely to get corrupted and would be easy to restore even without a backup since it's just u-boot there. But it cost you nothing you backup the 1st 100MB of the eMMC with dd.

This is the kind of question that would require some guideline at some point on our wiki ;-)

gprovost reacted to clostro in How to do a full hardware test? April 1, 2021

May I suggest outputting dmesg live to a network location?
I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.

nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1
exit

needed to have single >, and exit the session with 'exit' apparently..

gprovost got a reaction from hartraft in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED March 29, 2021

We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.

For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.

gprovost got a reaction from lanefu in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED March 26, 2021

We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.

For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.

gprovost reacted to ShadowDance in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED March 26, 2021

@Wofferl those are the exact same model as three of my disks (but mine aren't "Plus"). I've used these disks in another machine with ZFS and zero issues (ASM1062 SATA controller). So if we assume the problem is between SATA controller and disk, and while I agree with you that it's probably in part a disk issue, I'm convinced it's something that would be fixable on the SATA controller firmware. Perhaps these disks do something funny that the SATA controller doesn't expect? And based on all my testing so far, the SATA cable also plays a role, meaning perhaps there's a noise-factor in play (as well).

Side-note; Western Digital really screwed us over with this whole SMR fiasco, didn't they. I'd be pretty much ready to throw these disks in the trash if it wasn't for the fact that they worked perfectly on another SATA controller.

@grek glad it helped! By the way, I would still recommend changing the io scheduler to none because bfq is CPU intensive, and ZFS does it's own scheduling. Probably wont fix issues but might reduce some CPU overhead.

gprovost got a reaction from meymarce in Feature / Changes requests for future Helios64 board or enclosure revisions May 21, 2021

@dieKatze88 Yes this is already been announced here and there that we will replace the wire harness by a proper PCB backplane. There will still be wire tough connecting the main board to the backplane since we don't want a board that can only be used with a specific backplane. But these wires will be normal SATA cables, so easy to buy new ones anywhere if replacement is needed.

gprovost reacted to dieKatze88 in How to do a full hardware test? March 25, 2021

I have disabled zram, as it was suggested by someone on Reddit.

I am now running the latest kernel, but absolutely no kernel in my history of this thing has been stable.

I got the following serial console the last time it crashed (But could not edit my post due to limits):

[10105.431800] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 [10105.432752] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G C 5.10.21-rockchip64 #21.02.3 [10105.433526] Hardware name: Helios64 (DT) [10105.433872] Call trace: [10105.434093] dump_backtrace+0x0/0x200 [10105.434418] show_stack+0x18/0x68 [10105.434714] dump_stack+0xcc/0x124 [10105.435016] panic+0x174/0x374 [10105.435288] __stack_chk_fail+0x3c/0x40 [10105.435626] rcu_sched_clock_irq+0x7a4/0xce0 [10105.436004] update_process_times+0x60/0xa0 [10105.436373] tick_sched_handle.isra.19+0x40/0x58 [10105.436778] tick_sched_timer+0x58/0xb0 [10105.437118] __hrtimer_run_queues+0x104/0x388 [10105.437502] hrtimer_interrupt+0xf4/0x250 [10105.437861] arch_timer_handler_phys+0x30/0x40 [10105.438258] handle_percpu_devid_irq+0xa0/0x298 [10105.438659] generic_handle_irq+0x30/0x48 [10105.439012] __handle_domain_irq+0x94/0x108 [10105.439384] gic_handle_irq+0xc0/0x140 [10105.439715] el1_irq+0xc0/0x180 [10105.439995] arch_cpu_idle+0x18/0x28 [10105.440310] default_idle_call+0x44/0x1bc [10105.440665] do_idle+0x204/0x278 [10105.440950] cpu_startup_entry+0x28/0x60 [10105.441298] secondary_start_kernel+0x170/0x180 [10105.441700] SMP: stopping secondary CPUs [10105.442057] Kernel Offset: disabled [10105.442365] CPU features: 0x0240022,6100200c [10105.442740] Memory Limit: none [10105.443021] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 ]---
root@helios64:~# uname -a
Linux helios64 5.10.21-rockchip64 #21.02.3 SMP PREEMPT Mon Mar 8 01:05:08 UTC 2021 aarch64 GNU/Linux
root@helios64:~#

my armbian monitor:
http://ix.io/2U0J

gprovost reacted to ShadowDance in ZFS or normal Raid March 24, 2021

@scottf007 I think it would be hard for anyone here to really answer if it's worth it or not [for you]. In your situation, I'd try to evaluate whether or not you need the features that ZFS give you. For instance, ZFS snapshots is something you never really need, until you do. When you find that you've deleted some data a month ago and can still recover it from a snapshot, it's a great comfort. If that's something you value, btrfs could be an alternative and is already built into the kernel. If all you need is data integrity, you could consider dm-integrity+mdraid and file system of choice on top (EXT4, XFS, etc.). Skipping "raid" all-together would also be possible, LVM allows for great flexibility with disks.

If you're worried about the amount of work you need to put in with ZFS, you can freeze the updates when you are satisfied with the stability of the system. Just hit `sudo apt-mark hold linux-image-current-rockchip64 linux-dtb-current-rockchip64` which prevents kernel/boot instruction updates and you should not have ZFS break on you any time soon. Conversely, `unhold` once you're ready to deal with the future.

For me personally, ZFS is totally worth it. I have it on two server/NAS at home. I use ZFS native encryption on one, and LUKS+ZFS on the Helios64 (due to CPU capabilities). I also use a tool named zrepl for automatically creating, pruning and replicating snapshots. So for instance, my most important datasets are backed up from my one machine to the Helios64 in raw mode, this means the data is safe, but not readable by the Helios64 without loading the encryption keys. I also run Armbian on the Helios64 straight off of ZFS (root on ZFS), this gives me the ability to easily roll-back the system if, say, an update broke it.

@hartraft depends on your requirements/feature wishlist. RAID (mdraid), for instance, cannot guarantee data consistency (unless stacked with dm-integrity). What this means is that once data is written to the disk, it can still become corrupted and RAID can't catch it. ZFS guards against this via checksums on all data, i.e. once it's on disk, it's guarantee-ably either not corrupted or that corruption will be detected and likely repairable from one of the redundant disks. ZFS also has support for snapshots, meaning you can easily recover deleted files from snapshots, etc. RAID does not support anything like this. Looking at mergerfs, it seems to lack these features as well, and it runs in user-space (via FUSE), so not as integrated. SnapRaid is a backup program so not really comparable and MooseFS I know nothing about, but looks enterprise-y.

The closest match-up for ZFS in terms of features is probably btrfs (in kernel) or bcachefs (have never used this).

gprovost reacted to grek in Crazy instability :( March 23, 2021

I have tried to changed cpu governor to : ondemand
root@helios64:/# cat /etc/default/cpufrequtils ENABLE="true" GOVERNOR=ondemand MAX_SPEED=1800000 MIN_SPEED=408000
but after 5min, my server got rebooted .
Previously I used :
cat /etc/default/cpufrequtils ENABLE="true" GOVERNOR=performance MAX_SPEED=1416000 MIN_SPEED=1416000
for last 3 kernel updates without any issue. Uptime's was mostly about 2-3 weeks without kernel panic or reboots

Unfortunately I don't have nothing on my logs. I have changed verbose mode to 7 ,and Im logging to the file my console. I will update when I get error.
I have ZFS mirror ( LUKS on the top) + ZFS cache
Also I din't changed the disks scheduler. I have bfq , but it was worked since last upgrade without any problem.

currently I ran zfs scrub ... we will see.

[update 1] :
root@helios64:/# uptime
11:02:40 up 2:28, 1 user, load average: 4.80, 4.81, 4.60]

scrub is still in progress ,but I got few read errors. I'm not sure if it problem with HDD connector, Hdd scheduler , or cpu freq....
I didn't have it before.

[update 2]:
Server has been rebooted .....
last 'tasks' what i remember was generating thumbnails by nextcloud .
Unfortunately nothing in the console (it was connected all the time ) with:
root@helios64:~# cat /boot/armbianEnv.txt verbosity=7 console=serial extraargs=earlyprintk ignore_loglevel
failed command: READ FPDMA QUEUED , errors comes from zfs scrum few hours ealier

gprovost reacted to SvenHz in ZFS on Helios4 March 22, 2021

Here is an update. TL;DR: happy to report that zfs 2.0.2 seems to work fine on Armbian Hirsute.

I am planning a reinstall of our "family production NAS" with the aim to do as little hacking/customization as possible to get ZFS to work.

First I tried both the current (Mar 2021) Buster and Focal images but failed at the 'apt install zfs-dkms' step due to "exec format error" and basically failing to produce a working kernel module for ZFS. I suspect this is caused by the older binutils on both images.

So then I downloaded the Ubuntu Hirsute unstable image with kernel 5.10.23 of 13 Mar 2021. Supplied binutils is 2.36 I believe.

After the clean install, these are the steps I took:

# apt update
# apt upgrade
Use armbian-config to install kernel headers (turns out to be 5.10.17)
Use armbian-config to downgrade kernel 5.10.23 to 5.10.17 to match old headers...
# apt install zfs-dkms

apt install zfsutils-linux

After this and a reboot, I successfully imported an existing (zfs 0.7.x) pool.

So far so good. I will wait until we are stable with Hirsute.

gprovost reacted to antsu in Crazy instability :( March 18, 2021

@ShadowDance I think you're definitely on to something here! I just ran the Rsync jobs after setting the scheduler to none for all disks and it completed successfully without crashing.
I'll keep an eye on it and report any other crashes, but for now thank you very much!

Update: It's now a little over 3 hours into a ZFS scrub, and I restarted all my VMs simultaneously *while* doing the scrub and it has not rebooted nor complained about anything on dmesg. This is very promising!

Update 2: Scrub finished without problems!

gprovost got a reaction from TRS-80 in armbian-config RFC ideas March 18, 2021

@TRS-80 I think it's important once in a while to re-question the whole thing because it helps to clarify the real objective that we might sometimes forget... or justify that energy should be focused somewhere else.

I think ultimately the life of a distro will always depends on the size of its user base and lets distinguish here community and user base. User base includes all the passive users that we never hear from, I have no clue how much that represent for Armbian. @Igor Maybe you have such numbers : number of d/l per image and number of active forum user ? Even though this will not necessary help to say if armband-config is useful, it will help to give a sense of proportion of user we never hear from which to my assumption are often users that don't hack/tinker too much with their boards.

Then I think it could be useful to run a poll on the forum & twitter and simply ask what the people are using their SBC + Armbian for :
1/ Hacking / Tinkering
2/ Headless server
3/ Set-top TV box
4/ Work Desktop

Such kind of poll could will help understand the audience usage. If the great majority vote for option 1, then it would back up @TRS-80 assumption. But if majority is 3 and 4, then clearly we are talking about a user based that is not really CLI oriented. As for option 2, is a bit 50 / 50.

We should also be aware of what's happening out there.
1/ DietPi user base is growing and I guess it's because of their approach of making a big eco system of 3rd party app available to user via an interface a bit similar to armbian-config.
2/ Debian / Ubuntu install, even in headless mode, is actually not purely CLI so we all get use to a bit of GUI even if we are advanced CLI users.

I agree with Igor that the strength of armbian-config is also a lot the configuration features (without forgetting nand-sata-install which also deserve its attention). So personally I see a big plus to have armbian-config around, because it can only help to widen the user base which in fine is beneficial to Armbian sustainability.

Yes that's a lot of assumption I made, It's why I think the poll would be really a nice driver for this refactoring effort.

gprovost reacted to ShadowDance in Crazy instability :( March 17, 2021

@antsu you could try changing the IO scheduler for those ZFS disks (to `none`) and see if it helps, wrote about it here:

gprovost reacted to ShadowDance in Helios64 - freeze whatever the kernel is. March 17, 2021

@jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.

You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.

gprovost reacted to Seneca in Helios64 - freeze whatever the kernel is. March 17, 2021

I've tried to provoke a system freeze with high cpu and IO, but it seems stable for now.
20:05:49 up 4 days, 23:47, 1 user, load average: 0,14, 0,15, 0,11 I'll update this thread if the issue reoccurs.

gprovost reacted to lanefu in armbian-config RFC ideas March 16, 2021

Opi Zero still insanely popular. IMHO that would be users wanting to "do something" in a hurry... which is where armbian-config shines

One could assume downloads with legacy kernel are being used in some sort of media or desktop way

gprovost reacted to Igor in armbian-config RFC ideas March 16, 2021

I can only add some numbers more like estimation. Active monthly users by Google analytics is 70-80k, whatever that means. It's difficult to say about the download numbers - between several hundred and several thousands per day - getting correct numbers would require some effort since we have many points of download, that only have standard log, or torrent download which is active all the time, but data is not collected. We don't abuse users by calling home on install ... dietpi still do that by default. Most popular downloads are: Orangepi Zero, PC, Zero2, RK322x, Opi4, Tinkerboard ... No wonder we have 4 known enterprise grade download mirrors in China, which majority was discovered last week

It's hard to decide which is more important, but yes, this part also needs attention. I know that in both cases there will be significant work and we can't do it alone / just amateur way. Crowdfunding campaign to additionally support this project is planned.

In term of internals, yes. We use the same package base, while Armbian was never just Debian or Ubuntu. Yes we are aware, but most of users aren't - the most expensive part is hardware interface - where absolutely nothing connects us to Debian/Ubuntu. Its our product and we also pack it different. Networking defaults are ours, welcome, install, the way desktop looks, ... We don't use Debian / Ubuntu package base as is, but we select which packages we will use to form default userland. What we do keep is - in 99% - packages versions and their relation. Which is the key to secure "as close to Debian / Ubuntu" as possible. And yes, we also doesn't want to replace utilities that works well and use our dirty made rebranded replacement (what dietpi does in many cases). We provide functional better replacements like own hostapd (that is closer to the powers of openwrt) or improved htop, that contains data which are important in SBC world ...

gprovost reacted to ebin-dev in Backup method for system installed on SSD (slot1) March 16, 2021

I am using the following script to backup my root partition to sd (it is just a slightly modified Armbian script - please adapt the device name if necessary):

# cat backuptosd.sh #!/bin/bash # Check if user is root if [ $(id -u) != "0" ]; then echo "Error: You must be root to run this script." exit 1 fi cat > install-exclude <<EOF /dev/* /proc/* /sys/* /media/data1/* /media/data2/* /media/data3/* /media/data4/* /media/data5/* /mnt/sd/* /mnt/ssd/* /mnt/usb/* /mnt/hd/* /run/* # /tmp/* # /root/* EOF exec 2>/dev/null umount /mnt/sd exec 2>&1 mount /dev/mmcblk1p1 /mnt/sd rsync -avxSE --delete --exclude-from="install-exclude" / /mnt/sd # change fstab sed -e 's/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx3c/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx2d/g' -i /mnt/sd/etc/fstab sed -e 's/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx3c/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx2d/g' -i /mnt/sd/boot/armbianEnv.txt umount /mnt/sd rm install-exclude
The UUIDs need to be inserted (blkid is your friend) - the leftmost is the one of your root system, the other one is the UUID of the sd in this example.
If you need a bootable system on sd - the easiest way would be to start with a fresh Armbian image flashed to the sd card and to boot from it at least once in order to expand the filesystem.
Then you may boot from your main root partition and simply sync it to the sd card using the above script.

Sign In

gprovost

Posts

Joined

Last visited

Reputation Activity

Forums

My Activity Streams

Download

Store

Important Information