Jump to content

gprovost

Members
  • Posts

    580
  • Joined

  • Last visited

Reputation Activity

  1. Like
    gprovost reacted to rupert in Does anyone actually have a stable system?   
    HI,
     
    My poor little Kobol has to work very hard for a living:
     
    Plex
    NTP (GPSD) server
    SMB share
    Webmin
    Zabbix agent
    Zerotier client/bridge
    Docker
    Storj node
    Raid 5 across 5 disks
     
    And is as good as gold!
     
    Had a few niggles to start with ie first week, since then its only rebooted as the OS requests otherwise excellent!
     
    Running on a 64Gb SD Card
     
    CPU full speed
     
    Very pleased!
     
    Rup
     
  2. Like
    gprovost reacted to dieKatze88 in Does anyone actually have a stable system?   
    My system is still unstable and I have no idea what to do. I've given every suggestion I've seen on the internet a try and I think the only thing I can do is move on.
  3. Like
    gprovost reacted to aprayoga in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    Hi all, you could download the SATA firmware update at https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_update_00021_200624.img.xz
     
    Instruction:
    1. Download the sd card image
    2. Flash the image into a microSD card
    3. Insert microSD card into Helios64 and power on. Helios64 should automatically boot from microSD. If Helios64 still boot from eMMC, disable the eMMC 
    4. Wait for a while, the system will do a reboot and then power off if firmware flashing succeed.
       If failed, both System Status and System Fault LEDs will blink
    5. Remove the microSD card and boot Helios64 normally. See if there is any improvement.
     
    Our officially supported stock firmware can be downloaded from https://cdn-kobol-io.s3-ap-southeast-1.amazonaws.com/Helios64/SATA_FW/Helios64_SATA_FW_factory_00020_190702.img.xz. If there is no improvement on newer firmware, please revert back to this stock firmware.
     
    SHA256SUM:
    e5dfbe84f4709a3e2138ffb620f0ee62ecbcc79a8f83692c1c1d7a4361f0d30f *Helios64_SATA_FW_factory_00020_190702.img.xz 0d78fec569dd699fd667acf59ba7b07c420a2865e1bcb8b85b26b61d404998c5 *Helios64_SATA_FW_update_00021_200624.img.xz  
  4. Like
    gprovost reacted to wurmfood in Migrate from ramlog to disk   
    Well, for anyone else interested in trying this, here's the basic order I did:
    stop armbian-ramlog disable armbian-ramlog create a zfs dataset and mount it at /var/log cp -ar everything from /var/log.hdd to the new /var/log modify /etc/logrotate to disable compression (since the dataset is already using compression) modify /etc/default/armbian-ramlog to disable it there as well modify /etc/default/armbian-zram-config to adjust for new numbers (I have ZRAM_PERCENTAGE and MEM_LIMIT_PERCENTAGE at 15). reboot
  5. Like
    gprovost reacted to jpegxguy in Problem with apt list   
    Just for anyone reading this, this might be related. Someone suggested prioritizing gzip in order to escape the slowness of LZ4 on ARM platforms.
    Here is the solution, which was also merged in Armbian itself:
     
  6. Like
    gprovost got a reaction from allen--smithee in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @Fred Fettinger Your errors are most likely due too a not stable HDD harness. You can contact us to support@kobol.io to see if replacement of the HDD wire harness is needed.
     
    @ShadowDance Thanks for the feedback and at least we can remove grounding issue from the list of possible root causes.
    Looks like that in the use case of very heavy HDD I/O load arising mainly during scrubbing operation too much noise is generated resulting in those HSM violation.
    Thanks to all your tests, the issue seem unfortunately to point to noise filtering issue. As previously mentioned, we will fix that in next revision.
    We will see if the new SATA controller firmware has any impact but we doubt that. I think the only solution for current revision is to limit ATA speed to 3Gbps when using btrfs or zfs.
  7. Like
    gprovost reacted to deanl3 in nand-sata-install to eMMC, won't start kernel   
    Sorry to repeat another case of this. Have attempted a few times to move boot from SD card to eMMC before the SD card fails.
     
    I did according to the wiki doc with nand-sata-install. Since I have OMV, I excluded the exports, and also afterward copied the salt and pillar. Also, in another attempt, on suggestion from another thread, I updated the boot loader and copied again.
    I can boot to my SD card, mount the eMMC, and see the root structure.
     
    From what I can tell in reading the boot output, it can't find any storage device?
    4 USB Device(s) found
           scanning usb for storage devices... 0 Storage Device(s) found
    But I can see it when the OS is up, mount and read it, and nand-sata-install can write to it.
     
    Here's the boot output:
    DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 253 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 SdmmcInit=0 1 StorageInit ok = 67744 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xe5674 RunBL31 0x40000 �NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (Jan 05 2021 - 00:07:57 +0100) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 DDR Version 1.24 20191016 In channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! change freq to 416MHz 0,1 Channel 0: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB Channel 1: LPDDR4,416MHz Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Die Bus-Width=16 Size=2048MB 256B stride channel 0 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 1 CS = 0 MR0=0x18 MR4=0x1 MR5=0x1 MR8=0x10 MR12=0x72 MR14=0x72 MR18=0x0 MR19=0x0 MR24=0x8 MR25=0x0 channel 0 training pass! channel 1 training pass! channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done change freq to 856MHz 1,0 ch 0 ddrconfig = 0x101, ddrsize = 0x40 ch 1 ddrconfig = 0x101, ddrsize = 0x40 pmugrf_os_reg[2] = 0x32C1F2C1, stride = 0xD ddr_set_rate to 328MHZ ddr_set_rate to 666MHZ ddr_set_rate to 928MHZ channel 0, cs 0, advanced training done channel 1, cs 0, advanced training done ddr_set_rate to 416MHZ, ctl_index 0 ddr_set_rate to 856MHZ, ctl_index 1 support 416 856 328 666 928 MHz, current 856MHz OUT Boot1: 2019-03-14, version: 1.19 CPUId = 0x0 ChipType = 0x10, 254 SdmmcInit=2 0 BootCapSize=100000 UserCapSize=14910MB FwPartOffset=2000 , 100000 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 mmc0:cmd8,20 mmc0:cmd5,20 mmc0:cmd55,20 mmc0:cmd1,20 SdmmcInit=0 1 StorageInit ok = 67748 SecureMode = 0 SecureInit read PBA: 0x4 SecureInit read PBA: 0x404 SecureInit read PBA: 0x804 SecureInit read PBA: 0xc04 SecureInit read PBA: 0x1004 SecureInit read PBA: 0x1404 SecureInit read PBA: 0x1804 SecureInit read PBA: 0x1c04 SecureInit ret = 0, SecureMode = 0 atags_set_bootdev: ret:(0) GPT 0x3380ec0 signature is wrong recovery gpt... GPT 0x3380ec0 signature is wrong recovery gpt fail! LoadTrust Addr:0x4000 No find bl30.bin No find bl32.bin Load uboot, ReadLba = 2000 Load OK, addr=0x200000, size=0xe5674 RunBL31 0x40000 �NOTICE: BL31: v1.3(debug):42583b6 NOTICE: BL31: Built : 07:55:13, Oct 15 2019 NOTICE: BL31: Rockchip release version: v1.1 INFO: GICv3 with legacy support detected. ARM GICV3 driver initialized in EL3 INFO: Using opteed sec cpu_context! INFO: boot cpu mask: 0 INFO: plat_rockchip_pmu_init(1190): pd status 3e INFO: BL31: Initializing runtime services WARNING: No OPTEE provided by BL2 boot loader, Booting device without OPTEE initialization. SMC`s destined for OPTEE will return SMC_UNK ERROR: Error initializing runtime service opteed_fast INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x200000 INFO: SPSR = 0x3c9 U-Boot 2020.10-armbian (Jan 05 2021 - 00:07:57 +0100) SoC: Rockchip rk3399 Reset cause: POR DRAM: 3.9 GiB PMIC: RK808 SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB MMC: mmc@fe320000: 1, sdhci@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Helios64 Revision: 1.2 - 4GB non ECC Net: eth0: ethernet@fe300000 scanning bus for devices... starting USB... Bus usb@fe380000: USB EHCI 1.00 Bus dwc3: usb maximum-speed not found Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe380000 for devices... 1 USB Device(s) found scanning bus dwc3 for devices... cannot reset port 4!? 4 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Hit any key to stop autoboot: 0 Card did not respond to voltage select! switch to partitions #0, OK mmc0(part 0) is current device Scanning mmc 0:1... Found U-Boot script /boot/boot.scr 3185 bytes read in 19 ms (163.1 KiB/s) ## Executing script at 00500000 Boot script loaded from mmc 0 359 bytes read in 15 ms (22.5 KiB/s) 16181146 bytes read in 1556 ms (9.9 MiB/s) 27507200 bytes read in 2631 ms (10 MiB/s) 81696 bytes read in 41 ms (1.9 MiB/s) Failed to load '/boot/dtb/rockchip/overlay/-fixup.scr' Moving Image from 0x2080000 to 0x2200000, end=3cd0000 ## Loading init Ramdisk from Legacy Image at 06000000 ... Image Name: uInitrd Image Type: AArch64 Linux RAMDisk Image (gzip compressed) Data Size: 16181082 Bytes = 15.4 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 01f00000 Booting using the fdt blob at 0x1f00000 Loading Ramdisk to f4f81000, end f5eef75a ... OK Loading Device Tree to 00000000f4f04000, end 00000000f4f80fff ... OK Starting kernel ...  
  8. Like
    gprovost got a reaction from hartraft in Image Backup/Restore from Boot(Emmc)+System(M.2-SSD)   
    Yeah recovering would require you to boot the board from microSD card (using any recent image for Helios64). Then you will need to use the right tool (e.g dd or fsarchiver) to write your system backup on the M.2 SSD.
     
    If you use eMMC only for u-boot then unlikely to get corrupted and would be easy to restore even without a backup since it's just u-boot there. But it cost you nothing you backup the 1st 100MB of the eMMC with dd.
     
    This is the kind of question that would require some guideline at some point on our wiki ;-)
  9. Like
    gprovost reacted to clostro in How to do a full hardware test?   
    May I suggest outputting dmesg live to a network location?
    I'm not sure if the serial console output is the same as 'dmesg' but if it is, you can live 'nohup &' it to any file. That way you wouldn't have to keep connected to console or ssh all the time. Just don't output it to any local file system as writing to a local file system at a crash might corrupt it and cause more problems.
     
    nohup dmesg --follow > /network/location/folder/helios64-log.txt & 2>&1
    exit
     
    needed to have single >, and exit the session with 'exit' apparently..
  10. Like
    gprovost got a reaction from hartraft in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.
     
    For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.
  11. Like
    gprovost got a reaction from lanefu in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    We will provide soon instruction on how to update SATA controller (JMB585) firmware with the latest version. Let see if it has positive impact or not.
     
    For the new revision we will reinforce noise filtering on the SATA controller in case this is part of the root cause in some cases.
  12. Like
    gprovost reacted to ShadowDance in SATA issue, drive resets: ataX.00: failed command: READ FPDMA QUEUED   
    @Wofferl those are the exact same model as three of my disks (but mine aren't "Plus"). I've used these disks in another machine with ZFS and zero issues (ASM1062 SATA controller). So if we assume the problem is between SATA controller and disk, and while I agree with you that it's probably in part a disk issue, I'm convinced it's something that would be fixable on the SATA controller firmware. Perhaps these disks do something funny that the SATA controller doesn't expect? And based on all my testing so far, the SATA cable also plays a role, meaning perhaps there's a noise-factor in play (as well).
     
    Side-note; Western Digital really screwed us over with this whole SMR fiasco, didn't they. I'd be pretty much ready to throw these disks in the trash if it wasn't for the fact that they worked perfectly on another SATA controller.
     
    @grek glad it helped! By the way, I would still recommend changing the io scheduler to none because bfq is CPU intensive, and ZFS does it's own scheduling. Probably wont fix issues but might reduce some CPU overhead.
  13. Like
    gprovost got a reaction from meymarce in Feature / Changes requests for future Helios64 board or enclosure revisions   
    @dieKatze88 Yes this is already been announced here and there that we will replace the wire harness by a proper PCB backplane. There will still be wire tough connecting the main board to the backplane since we don't want a board that can only be used with a specific backplane. But these wires will be normal SATA cables, so easy to buy new ones anywhere if replacement is needed.
  14. Like
    gprovost reacted to dieKatze88 in How to do a full hardware test?   
    I have disabled zram, as it was suggested by someone on Reddit.
     
    I am now running the latest kernel, but absolutely no kernel in my history of this thing has been stable.

    I got the following serial console the last time it crashed (But could not edit my post due to limits):
     
    [10105.431800] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 [10105.432752] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G C 5.10.21-rockchip64 #21.02.3 [10105.433526] Hardware name: Helios64 (DT) [10105.433872] Call trace: [10105.434093] dump_backtrace+0x0/0x200 [10105.434418] show_stack+0x18/0x68 [10105.434714] dump_stack+0xcc/0x124 [10105.435016] panic+0x174/0x374 [10105.435288] __stack_chk_fail+0x3c/0x40 [10105.435626] rcu_sched_clock_irq+0x7a4/0xce0 [10105.436004] update_process_times+0x60/0xa0 [10105.436373] tick_sched_handle.isra.19+0x40/0x58 [10105.436778] tick_sched_timer+0x58/0xb0 [10105.437118] __hrtimer_run_queues+0x104/0x388 [10105.437502] hrtimer_interrupt+0xf4/0x250 [10105.437861] arch_timer_handler_phys+0x30/0x40 [10105.438258] handle_percpu_devid_irq+0xa0/0x298 [10105.438659] generic_handle_irq+0x30/0x48 [10105.439012] __handle_domain_irq+0x94/0x108 [10105.439384] gic_handle_irq+0xc0/0x140 [10105.439715] el1_irq+0xc0/0x180 [10105.439995] arch_cpu_idle+0x18/0x28 [10105.440310] default_idle_call+0x44/0x1bc [10105.440665] do_idle+0x204/0x278 [10105.440950] cpu_startup_entry+0x28/0x60 [10105.441298] secondary_start_kernel+0x170/0x180 [10105.441700] SMP: stopping secondary CPUs [10105.442057] Kernel Offset: disabled [10105.442365] CPU features: 0x0240022,6100200c [10105.442740] Memory Limit: none [10105.443021] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: rcu_sched_clock_irq+0x7a4/0xce0 ]---
    root@helios64:~# uname -a
    Linux helios64 5.10.21-rockchip64 #21.02.3 SMP PREEMPT Mon Mar 8 01:05:08 UTC 2021 aarch64 GNU/Linux
    root@helios64:~#
     
    my armbian monitor:
    http://ix.io/2U0J
  15. Like
    gprovost reacted to ShadowDance in ZFS or normal Raid   
    @scottf007 I think it would be hard for anyone here to really answer if it's worth it or not [for you]. In your situation, I'd try to evaluate whether or not you need the features that ZFS give you. For instance, ZFS snapshots is something you never really need, until you do. When you find that you've deleted some data a month ago and can still recover it from a snapshot, it's a great comfort. If that's something you value, btrfs could be an alternative and is already built into the kernel. If all you need is data integrity, you could consider dm-integrity+mdraid and file system of choice on top (EXT4, XFS, etc.). Skipping "raid" all-together would also be possible, LVM allows for great flexibility with disks.
     
    If you're worried about the amount of work you need to put in with ZFS, you can freeze the updates when you are satisfied with the stability of the system. Just hit `sudo apt-mark hold linux-image-current-rockchip64 linux-dtb-current-rockchip64` which prevents kernel/boot instruction updates and you should not have ZFS break on you any time soon. Conversely, `unhold` once you're ready to deal with the future.
     
    For me personally, ZFS is totally worth it. I have it on two server/NAS at home. I use ZFS native encryption on one, and LUKS+ZFS on the Helios64 (due to CPU capabilities). I also use a tool named zrepl for automatically creating, pruning and replicating snapshots. So for instance, my most important datasets are backed up from my one machine to the Helios64 in raw mode, this means the data is safe, but not readable by the Helios64 without loading the encryption keys. I also run Armbian on the Helios64 straight off of ZFS (root on ZFS), this gives me the ability to easily roll-back the system if, say, an update broke it.
     
    @hartraft depends on your requirements/feature wishlist. RAID (mdraid), for instance, cannot guarantee data consistency (unless stacked with dm-integrity). What this means is that once data is written to the disk, it can still become corrupted and RAID can't catch it. ZFS guards against this via checksums on all data, i.e. once it's on disk, it's guarantee-ably either not corrupted or that corruption will be detected and likely repairable from one of the redundant disks. ZFS also has support for snapshots, meaning you can easily recover deleted files from snapshots, etc. RAID does not support anything like this. Looking at mergerfs, it seems to lack these features as well, and it runs in user-space (via FUSE), so not as integrated. SnapRaid is a backup program so not really comparable and MooseFS I know nothing about, but looks enterprise-y.
     
    The closest match-up for ZFS in terms of features is probably btrfs (in kernel) or bcachefs (have never used this).
  16. Like
    gprovost reacted to grek in Crazy instability :(   
    I have tried to changed cpu governor to : ondemand
    root@helios64:/# cat /etc/default/cpufrequtils ENABLE="true" GOVERNOR=ondemand MAX_SPEED=1800000 MIN_SPEED=408000  
    but after 5min, my server got rebooted .
    Previously I used :
    cat /etc/default/cpufrequtils ENABLE="true" GOVERNOR=performance MAX_SPEED=1416000 MIN_SPEED=1416000  
    for last 3 kernel updates without any issue. Uptime's was mostly about 2-3 weeks without kernel panic or reboots
     
    Unfortunately I don't have nothing on my logs. I have changed verbose mode to 7 ,and Im logging to the file my console. I will update when I get error.
    I have  ZFS mirror ( LUKS on the top) + ZFS cache 
    Also I din't changed the disks scheduler. I have bfq , but it was worked since last upgrade without any problem. 
     
    currently I ran zfs scrub ... we will see. 
     
    [update 1] : 
    root@helios64:/# uptime
     11:02:40 up  2:28,  1 user,  load average: 4.80, 4.81, 4.60]
     
    scrub is still in progress ,but I got few read errors. I'm not sure if it problem with HDD connector, Hdd scheduler , or cpu freq....
    I didn't have it before.
     
     
     
     
     
    [update 2]:
    Server has been rebooted ..... 
    last 'tasks' what i remember was generating thumbnails by nextcloud .
    Unfortunately nothing in the console (it was connected all the time ) with:
    root@helios64:~# cat /boot/armbianEnv.txt  verbosity=7 console=serial extraargs=earlyprintk ignore_loglevel  
    failed command: READ FPDMA QUEUED , errors comes from zfs scrum few hours ealier
     
     

     
     
     
     
     
     
  17. Like
    gprovost reacted to SvenHz in ZFS on Helios4   
    Here is an update. TL;DR: happy to report that zfs 2.0.2 seems to work fine on Armbian Hirsute.
     
    I am planning a reinstall of our "family production NAS" with the aim to do as little hacking/customization as possible to get ZFS to work.
     
    First I tried both the current (Mar 2021) Buster and Focal images but failed at the 'apt install zfs-dkms' step due to "exec format error" and basically failing to produce a working kernel module for ZFS. I suspect this is caused by the older binutils on both images.
     
    So then I downloaded the Ubuntu Hirsute unstable image with kernel 5.10.23 of 13 Mar 2021. Supplied binutils is 2.36 I believe.

    After the clean install, these are the steps I took:
     
    # apt update
    # apt upgrade
    Use armbian-config to install kernel headers (turns out to be 5.10.17)
    Use armbian-config to downgrade kernel 5.10.23 to 5.10.17 to match old headers...
    # apt install zfs-dkms
     

    apt install zfsutils-linux
     
     
    After this and a reboot, I successfully imported an existing (zfs 0.7.x) pool.

    So far so good. I will wait until we are stable with Hirsute.
     
  18. Like
    gprovost reacted to antsu in Crazy instability :(   
    @ShadowDance I think you're definitely on to something here! I just ran the Rsync jobs after setting the scheduler to none for all disks and it completed successfully without crashing.
    I'll keep an eye on it and report any other crashes, but for now thank you very much!
     
    Update: It's now a little over 3 hours into a ZFS scrub, and I restarted all my VMs simultaneously *while* doing the scrub and it has not rebooted nor complained about anything on dmesg. This is very promising!
     
    Update 2: Scrub finished without problems!
  19. Like
    gprovost got a reaction from TRS-80 in armbian-config RFC ideas   
    @TRS-80 I think it's important once in a while to re-question the whole thing because it helps to clarify the real objective that we might sometimes forget... or justify that energy should be focused somewhere else.
     
    I think ultimately the life of a distro will always depends on the size of its user base and lets distinguish here community and user base. User base includes all the passive users that we never hear from, I have no clue how much that represent for Armbian. @Igor Maybe you have such numbers : number of d/l per image and number of active forum user ? Even though this will not necessary help to say if armband-config is useful, it will help to give a sense of proportion of user we never hear from which to my assumption are often users that don't hack/tinker too much with their boards.
     
    Then I think it could be useful to run a poll on the forum & twitter and simply ask what the people are using their SBC + Armbian for :
    1/ Hacking / Tinkering
    2/ Headless server
    3/ Set-top TV box
    4/ Work Desktop
     
    Such kind of poll could will help understand the audience usage. If the great majority vote for option 1, then it would back up @TRS-80 assumption. But if majority is 3 and 4, then clearly we are talking about a user based that is not really CLI oriented. As for option 2, is a bit 50 / 50.
     
    We should also be aware of what's happening out there.
    1/ DietPi user base is growing and I guess it's because of their approach of making a big eco system of 3rd party app available to user via an interface a bit similar to armbian-config.
    2/ Debian / Ubuntu install, even in headless mode, is actually not purely CLI so we all get use to a bit of GUI even if we are advanced CLI users.
     
    I agree with Igor that the strength of armbian-config is also a lot the configuration features (without forgetting nand-sata-install which also deserve its attention). So personally I see a big plus to have armbian-config around, because it can only help to widen the user base which in fine is beneficial to Armbian sustainability.
     
    Yes that's a lot of assumption I made, It's why I think the poll would be really a nice driver for this refactoring effort.
  20. Like
    gprovost reacted to ShadowDance in Crazy instability :(   
    @antsu you could try changing the IO scheduler for those ZFS disks (to `none`) and see if it helps, wrote about it here: 
     
  21. Like
    gprovost reacted to ShadowDance in Helios64 - freeze whatever the kernel is.   
    @jbergler I recently noticed the armbian-hardware-optimization script for Helios64 changes the IO scheduler to `bfq` for spinning disks, however, for ZFS we should be using `none` because it has it's own scheduler. Normally ZFS would change the scheduler itself, but that would only happen if you're using raw disks (not partitions) and if you import the zpool _after_ the hardware optimization script has run.
     
    You can try changing it (e.g. `echo none >/sys/block/sda/queue/scheduler`) for each ZFS disk and see if anything changes. I still haven't figured out if this is a cause for any problems, but it's worth a shot.
  22. Like
    gprovost reacted to Seneca in Helios64 - freeze whatever the kernel is.   
    I've tried to provoke a system freeze with high cpu and IO, but it seems stable for now.
    20:05:49 up 4 days, 23:47, 1 user, load average: 0,14, 0,15, 0,11 I'll update this thread if the issue reoccurs.
  23. Like
    gprovost reacted to lanefu in armbian-config RFC ideas   
    Opi Zero still insanely popular.  IMHO that would be users wanting to "do something" in a hurry... which is where armbian-config shines
     
    One could assume downloads with legacy kernel are being used in some sort of media or desktop way
     
     
     

  24. Like
    gprovost reacted to Igor in armbian-config RFC ideas   
    I can only add some numbers more like estimation. Active monthly users by Google analytics is 70-80k, whatever that means. It's difficult to say about the download numbers - between several hundred and several thousands per day - getting correct numbers would require some effort since we have many points of download, that only have standard log, or torrent download which is active all the time, but data is not collected. We don't abuse users by calling home on install ... dietpi still do that by default. Most popular downloads are: Orangepi Zero, PC, Zero2, RK322x, Opi4, Tinkerboard ... No wonder we have 4 known enterprise grade download mirrors in China, which majority was discovered last week 
     
     
    It's hard to decide which is more important, but yes, this part also needs attention. I know that in both cases there will be significant work and we can't do it alone / just amateur way. Crowdfunding campaign to additionally support this project is planned.
     
     
    In term of internals, yes. We use the same package base, while Armbian was never just Debian or Ubuntu. Yes we are aware, but most of users aren't - the most expensive part is hardware interface - where absolutely nothing connects us to Debian/Ubuntu. Its our product and we also pack it different. Networking defaults are ours, welcome, install, the way desktop looks, ... We don't use Debian / Ubuntu package base as is, but we select which packages we will use to form default userland. What we do keep is - in 99% - packages versions and their relation. Which is the key to secure "as close to Debian / Ubuntu" as possible. And yes, we also doesn't want to replace utilities that works well and use our dirty made rebranded replacement (what dietpi does in many cases). We provide functional better replacements like own hostapd (that is closer to the powers of openwrt) or improved htop, that contains data which are important in SBC world ...
  25. Like
    gprovost reacted to ebin-dev in Backup method for system installed on SSD (slot1)   
    I am using the following script to backup my root partition to sd (it is just a slightly modified Armbian script - please adapt the device name if necessary):
     
    # cat backuptosd.sh #!/bin/bash # Check if user is root if [ $(id -u) != "0" ]; then echo "Error: You must be root to run this script." exit 1 fi cat > install-exclude <<EOF /dev/* /proc/* /sys/* /media/data1/* /media/data2/* /media/data3/* /media/data4/* /media/data5/* /mnt/sd/* /mnt/ssd/* /mnt/usb/* /mnt/hd/* /run/* # /tmp/* # /root/* EOF exec 2>/dev/null umount /mnt/sd exec 2>&1 mount /dev/mmcblk1p1 /mnt/sd rsync -avxSE --delete --exclude-from="install-exclude" / /mnt/sd # change fstab sed -e 's/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx3c/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx2d/g' -i /mnt/sd/etc/fstab sed -e 's/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx3c/UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx2d/g' -i /mnt/sd/boot/armbianEnv.txt umount /mnt/sd rm install-exclude  
    The UUIDs need to be inserted (blkid is your friend) - the leftmost is the one of your root system, the other one is the UUID of the sd in this example.
    If you need a bootable system on sd - the easiest way would be to start with a fresh Armbian image flashed to the sd card and to boot from it at least once in order to expand the filesystem.
    Then you may boot from your main root partition and simply sync it to the sd card using the above script.
     
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines