gprovost

Members
  • Content Count

    77
  • Joined

  • Last visited

1 Follower

About gprovost

  • Rank
    Advanced Member

Contact Methods

  • Website URL
    http://kobol.io

Profile Information

  • Gender
    Male
  • Location
    Singapore

Recent Profile Visitors

626 profile views
  1. gprovost

    Helios4 Support

    Sorry to hear that you have a wipe out at the end your array. Hope your experience will remind others. I highlighted in our wiki the 16TB partition size limit for 32bit arch. You right, and we should have emphasized it more. I will write a page on our wiki on how to setup backup: 1. How to use rsync and cron to backup on a usb drive. 2. How to use duplicati to backup to cloud (e.g backblaze B2). Yeah I would have to check to see if OMV4 let you do this kind of mistake and will have then to highlight it to their team. Maybe during a rsync or copy of some of your files, those files have chunks that are in a scope of block addresses that go beyond what the 32-bit kernel can handle. Therefor it's like reading an illegal memory allocation which would trigger a segfault.
  2. gprovost

    Helios4 Support

    You will have to remember what operations you did exactly to setup this omv-public partition in order for us to understand what's the issue. 1. You shouldn't have been able to create a LV bigger tan 16TiB 2. You shouldn't have been able to create an ext4 partition bigger than 16TiB But somehow you managed to create an LV > 16TiB and it looks like your force the 64bit feature on the ext4 fs in order to make it bigger than 16TB. Maybe you can post sudo tune2fs -l /dev/mapper/omv-public (not 100% of path but should be that) Couldn't resume better : https://serverfault.com/questions/462029/unable-to-mount-a-18tb-raid-6/536758#536758
  3. gprovost

    Helios4 Support

    @alexcp It sounds more like a file system issue / corruption that might result of one of the kernel module crashing when accessing the file system on the array. Have you done a fsck on your array ? Based on your log, some file system corruption are detected on dm-0, which is your logical volume you created with LVM. Something that puzzle me is that the size of the logical volume is 18.2T, which shouldn't be possible on Helios4 since it's a 32bit architecture therefore each logical volume can only be max 16TB. So most likely this is the issue that make the kernel crash... but i don't understand how in the first place you were able to create and mount a partition that is more than 16TB. Can you provide the output of sudo lvdisplay
  4. gprovost

    Helios4 Support

    @alexcp Have you tried a normal copy (cp) ? The issue you are facing now with rsync seems to be more software related. Yup you can hook up your HDD to another rig. But why not try first with a fresh Debian install on Helios4. Prepare a new sdcard with latest Armbian Stretch release, then use following command to redetect your arrays : mdadm —assemble —scan The new system should be able to detect your array. Check with cat /proc/mdstat the md device number and status, then mount it and try again your rsync. If this fail again then yes try with your HDD connected to another rig.
  5. gprovost

    Helios4 Support

    Just to correct a wrong interpretation from my side. It seems the label set-A and set-B doesn't correspond to the mirror sets but rather to the strip sets... and even that I'm not sure, I cannot find a clear statement of what set-ABC means. Actually Linux MD RAID10 are not really nested RAID1 in a RAID0 array like most of us would picture it, but when using the default settings (layout = near, copies = 2) while creating a MD RAID10 it would fulfill the same characteristics and guarantees than nested RAID1+0. Layout : near=2 will write the data as follow (each chunk is repeated 2 times in a 4-way stripe array), so it's similar to a nested RAID1+0. | Device #0 | Device #1 | Device #2 | Device #3 | ------------------------------------------------------ 0x00 | 0 | 0 | 1 | 1 | 0x01 | 2 | 2 | 3 | 3 | : | : | : | : | : | : | : | : | : | : | So in your case @alexcp, you were lucky that Raid Device #0 and Device #2 were still OK. It's why the state of the array was still showing clean even though degraded. So it still give you a chance to backup or reinstate you RAID array by replacing the faulty disk. It's why it's important to setup mdadm to send alert email as soon as something goes wrong, to avoid facing the case you have issues with 2 drives at the same time. We will need to add something on our wiki to explain how to configure your OS to trigger the System Fault LED when mdadm detects some errors. Also will never repeat enough, if you have critical data on your NAS, always perform regular backup (either on external HDD or on the cloud e.g BackBlaze B2). To be no honest, I still wonder why your system crash during rsync. Because even a degraded but clean array (faulty and unclean drives where tagged as removed) shouldn't cause such issue. So once you are done with the resyncing RaidDevice #0, try to re-do your rsync now that the faulty is physically removed. Also I don't know what to make you of the rsync error messages you posted in your previous post yet.
  6. gprovost

    Helios4 Support

    @alexcp can you post cat /proc/mdstat ouput.
  7. gprovost

    Helios4 Support

    @alexcp First, as shown on your mdadm -D /dev/md127 output command, unfortunately right now you missing half of your RAID, the set-A mirror is gone... which is very bad. Number Major Minor RaidDevice State - 0 0 0 removed 1 8 16 1 active sync set-B /dev/sdb - 0 0 2 removed 3 8 48 3 active sync set-B /dev/sdd Let's cross finger you /dev/sda is not too much out of sync. Can you try re-add /dev/sda to the array : mdadm --manage /dev/md127 --re-add /dev/sda Hopefully it works. If yes, can you post again the mdadm -D /dev/md127 output here. If it cannot be re-added, do the following command mdadm --examine /dev/sd[abcd] >> raid.status and post the raid.status file here.
  8. gprovost

    Helios4 Support

    @alexcp That's useful information. Yes we can discard power budget issue. Ok from the armbian-monitor log I can see already 2 serious issues : 1/ HDD /dev/sdc on port SATA 3 (U12) shows a lot of READ DMA error, and even SMART command are failing. So either it's a faulty HDD, either something not good with the SATA cable. So I would advice first to try with another SATA cable to see if it could be a cable issue. If error persist then I'm afraid to say you have faulty HDD. (Note: do a proper shutdown before changing cable) How long you have been running your rig for ? I guess your HDD are still under warranty ? If you do a dmesg you will see a lot of the following errors that show something is wrong with the HDD. [ 8.113934] sd 2:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 [ 8.113939] sd 2:0:0:0: [sdc] tag#0 Sense Key : 0x3 [current] [ 8.113943] sd 2:0:0:0: [sdc] tag#0 ASC=0x31 ASCQ=0x0 [ 8.113947] sd 2:0:0:0: [sdc] tag#0 CDB: opcode=0x88 88 00 00 00 00 04 8c 3f ff 80 00 00 00 08 00 00 [ 8.113951] print_req_error: I/O error, dev sdc, sector 19532873600 [ 9.005672] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 9.005677] ata3.00: irq_stat 0x40000001 [ 9.005685] ata3.00: failed command: READ DMA [ 9.005700] ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 12 dma 4096 in res 53/40:08:00:00:00/00:00:00:00:00/40 Emask 0x8 (media error) [ 9.005704] ata3.00: status: { DRDY SENSE ERR } [ 9.005709] ata3.00: error: { UNC } [ 9.008370] ata3.00: configured for UDMA/133 [ 9.008383] ata3: EH complete [ 60.347211] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 60.347215] ata3.00: irq_stat 0x40000001 [ 60.347220] ata3.00: failed command: SMART [ 60.347228] ata3.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 23 res 53/40:00:00:00:00/00:00:00:00:00/00 Emask 0x8 (media error) [ 60.347231] ata3.00: status: { DRDY SENSE ERR } [ 60.347233] ata3.00: error: { UNC } [ 60.349912] ata3.00: configured for UDMA/133 [ 60.349940] ata3: EH complete 2/ You RAID is obviously degraded, but not only because of the /dev/sdc issue describe above, /dev/sda has been removed from the array because mdadm consider it as unclean. This could be the result of ungraceful shutdown, which seems to be triggered by issue number 1. Anyway issue with /dev/sda can be fixed. But first can you run the following command and post the output : sudo mdadm -D /dev/md127 , I need to understand how your RAID layout is affected with those issues. [ 8.054175] md: kicking non-fresh sda from array! [ 8.065216] md/raid10:md127: active with 2 out of 4 devices NAME FSTYPE SIZE MOUNTPOINT UUID sda linux_raid_member 9.1T 16d26e7c-3c2a-eef9-ec7c-df93ca0fbfa5 sdb linux_raid_member 9.1T 16d26e7c-3c2a-eef9-ec7c-df93ca0fbfa5 └─md127 LVM2_member 18.2T kC0nGt-RYKe-innN-7sKk-PQHi-g9mo-r67ATF └─omv-public ext4 18.2T c80cb9a5-cd2d-4dbe-8a93-af4eebe85635 sdc 9.1T sdd linux_raid_member 9.1T 16d26e7c-3c2a-eef9-ec7c-df93ca0fbfa5 └─md127 LVM2_member 18.2T kC0nGt-RYKe-innN-7sKk-PQHi-g9mo-r67ATF └─omv-public ext4 18.2T c80cb9a5-cd2d-4dbe-8a93-af4eebe85635 mmcblk0 29.7G └─mmcblk0p1 ext4 29.4G / 078e5925-a184-4dc3-91fb-ff3ba64b1a81 zram0 50M /var/log Conclusion : This could explain the system crash. I see you have a dm-0 device, did you encrypted a partition ? Ideally you send me by PM a copy of your log files (/var/log and /var/log.hdd).
  9. gprovost

    Helios4 Support

    @alexcp By default the SPI NOR Flash is already disable. But just to be sure, can you do execute the following command lsblk and confirm you don't see the following block device : mtdblock0 If the SPI NOR Flash is confirmed to be disable, what you describing sounds more like a power budget issue. Can you tell me which model of HDD you are using ? Also have you tried doing your rsync over SMB without any device connected on the USB ports (in order to narrow down the issue) ? Finally can you execute armbianmonitor -u and post the output link here. Thanks.
  10. gprovost

    Helios4 Support

    @NickS Thanks for sharing. I will test your mod later. However it would have been nice to fork the sys-oled project on github and put your modification there ;-)
  11. I configured SSH to work with cryptodev and use cipher AES-CBC-128. I did a scp download of the same 1GB file and get the following perf : Throughput : 56.6MB/s CPU Utilization : User %12.3, Sys %31.2 Pretty good :-) Important note : As concluded in previous post, in the case of Helios4, using cryptodev only for encryption and not for authentication (authentication involves hashing) is the only mode that provides some network performance and cpu load improvement. The other advantage of this mode, is that cryptodev will be completely skipped by sshd... because otherwise sshd will rise an exception during authentication because cryptodev try to do some ioctl() call that are forbidden by seccomp filter in sshd sandbox. If you still want to test using cryptodev for ssh, the easy workaround is to use normal privilege separation in sshd instead of sandbox (UsePrivilegeSeparation yes). Then as for apache2, you will have to force to use a cipher that is supported by the CESA engine (e.g aes128-cbc)... and most probably you will also have to do the same on client side. Disclaimer: The sshd tweaking is not recommended for security reason. Only experiment with it if you know what you are doing. For reference with Cipher encryption algo not supported by CESA : AES128-CTR CPU Utilization : User %39.1, Sys %16.4 Throughput : 39.9MB/s CHACHA20-POLY1305 (default cipher for ssh) CPU Utilization : User %40.6, Sys %17.0 Throughput : 29.8MB/s
  12. It's been a while I have on my TODO list : write a guide on how to activate and use the Marvell Cryptographic Engines And Security Accelerator (CESA) on Helios4. Previously I already shared some numbers related to the CESA engine while using @tkaiser sbc-bench tool. I also shared some findings on the openssl support for the kernel modules (cryptodev and af_alg) that interact with the cesa engine. My conclusion was : 1. performance wise : effectively cryptodev performs slightly better than af_alg. 2. openssl / libssl support : very messy and broken, it all depends which version of openssl you use. Since many Debian Stretch apps depend on "old" libssl (1.0.2), I felt taking the cryptodev approach was the best way since it could expose all encryption and authentication algorithms supported by the cesa engine... even though it requires some patching in openssl. Plus cryptodev implementation in new LTS openssl version 1.1.1 has been completely reworked, so long term it should be the right way. Anyhow I'm not going to describe here the step by step setup, I'm already writing a page on our wiki for that, once it's ready I will post the link here. Also I won't risk myself talking about the relevance of some of ciphers, it deserves a topic on its own. I'm just going to share benchmark number on a concrete use case which is HTTPS file download : So I setup on my Helios4 Apache2 to serve a 1GB file hosted on a SSD drive. Then I did 3 batch of download tests, for each batch I configured Apache2 to use a specific cipher that I know is supported by the cesa engine. AES_128_CBC_SHA AES_128_CBC_SHA256 AES_256_CBC_SHA256 For each batch, I do the following 3 tests : 1. download without cryptodev module loaded (100% software encryption) 2. download with cryptodev loaded and libssl (openssl) compiled with -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS 3. download with cryptodev loaded and libssl (openssl) compile only with -DHAVE_CRYPTODEV, which means hashing operation will still be done 100% by software. Here are the results : Note: CPU utilization is for both cores. Obviously each test is just a single process running on a single core therefore when you see CPU utilization around 50% (User% + Sys%) it means the core used for the test is fully loaded. Maybe i should have reported the number just for the core used, which would be more or less doing x2 of the value you see in the table. For reference: Using AES_128_GCM_SHA256 (Default Apache 2.4 TLS cipher. GCM mode is not something that can be accelerated by CESA.) CPU Utilization : User %42.9, Sys %7.2 Throughput : 30.6MB/s No HTTPS CPU Utilization : User %1.0, Sys %29.8 Throughput : 112MB/s CONCLUSION 1. Hashing operation are slower on the CESA engine than the CPU itself, therefore making HW encryption with hashing is less performant than 100% software encryption. 2. HW encryption without hashing provides 30 to 50% of throughput increase while decreasing the load on the CPU by 20 to 30%. 3. Still pondering if it's worth the effort to encourage people to do the move... but i think it's still cool improvement.
  13. gprovost

    Helios4 Support

    @NickS The conclusion is that the two GPIOs driving the PWM pins of each fan header are dead on your board. After spending quite some time investigating we are still puzzled on what could have created the issue, we still don't have a theory that could explain both occurrence. Anyhow you can PM me and we see how we can arrange a replacement of your board.
  14. gprovost

    Helios4 Support

    @NickS We are still investigating. If we come to the conclusion that the GPIO used for PWM might be spoiled / faulty on your board, then we will proceed to an exchange ;-) Will keep you updated this week.
  15. gprovost

    Support of Helios4 - Intro

    @malvcr This is going quite off topic and not hardware specific anymore. You need to create a dedicated thread if you want to discuss further your zram / swap topic. FYI zram module is available on Armbian (therefore available on Helios4). You can easily built your own startup script that setup swap on zram or if you using Ubuntu bionic you could also use the zram-config package. Actually swap on zram is already configured by default in Armbian since few releases.