Jump to content

frod0r

Members
  • Posts

    6
  • Joined

  • Last visited

  1. Thanks for the clarification. Just to add to your test cases: I have tested your image with memtester 3280M, for 15 loops (running half a day) without any errors. I then re-tested the Armbian_20.11.10_Nanopim4v2_buster_current_5.9.14.img image with memtester 3280M and received a failure in the first loop (at Bit Flip, after ~30min of testing). Edit: Oh I just saw I am a bit late with this additional test, as your pull request got merged 16hours ago. Sorry I took so long
  2. I also tested the RAID array with 100 1GiB blocks and got no errors, this is great!
  3. Thank you! I tested the image you sent and have received no errors so far. As before I have written and tested 100*1GiB files over each of the 4 sata ports. Next i will test writing on the raid Array Just out of interest, in the github issue you describe that the limit on voltage change is required since the regulatory circuit of the m4v2 can't handle more. I am curious whether I understood it correctly that this is not dependent of the power input method (5V usb c vs 12V ATX). It it is affected by that I want to add the Information that the 12v atx port is what I am using with a 10A PSU) Again thank you for this image, it really seems to have fixed my problem and I will post an update tomorrow when I have tested the raid as well
  4. I did some more testing, I am pretty certain now that I can exclude hardware faults: Again with fresh Armbian_20.11.10_Nanopim4v2_buster_current_5.9.14 and Armbian_20.11.10_Nanopim4v2_buster_legacy_4.4.213_desktop installations I received write errors on a drive that only has ext.4 on it (no raid or lvm layer). I tested every sata port with 100*1GiB blocks, I only received 1 error each in two of the runs and none in the other two, but erros non the less. I then also downloaded the proprietary friendlycore (to be precise rk3399-sd-friendlycore-focal-4.19-arm64-20201226.img) I tested with the same per-sata-port method as described above and 200*1GiB on the original raid array and I received no errors at all. I really am at a loss what the problem could be here
  5. First of all, Thank you for your answer, I appreciate it that you took time of your day to reply in my thread. Some comments: I am aware that WiringPi is EOL, however I was not yet aware of/ had not yet looked into a replacement, thank you for the resource. I am not quite sure I can follow. I installed the Buster desktop 4.4.y Image from https://www.armbian.com/nanopi-m4-v2. Does this not include an open source kernel? AFAIK arch does not use the same kernel as debian, and I was not entirely sure if some of the relevant packages might be diffeent, so I mentioned that, but I agree this should play no role. What I confirmed (or at least what my aim to confirm was) was that the error appears on armv8 but not on amd64 (both architectures are 64-bit). And that It is not an issue with the 4.19 base kernel per se, but appears to be in this specific setting But if that was the case, would I not get higher UDMA_CRC_Error_Count smart values? Also this conflicts with the obersvation, that the partition that is not managed by mdadm works fine with the otherwise same hardware+software setup. However I also tested the Armbian_20.11.10_Nanopim4v2_buster_current_5.9.14 image and writing 70 1GiB blocks, one was faulty, so you are probably right that there is some hardware issue somewhere in my setup.
  6. Hello there, I am using Armbian 20.11.6 Buster with Linux 4.4.213-rk3399 on a Nanopi M4V2, I have connected 3 SATA HDDs and use mdadm (for a RAID 5 Array of those drives), luks lvm and ext4 (in that order, more details below) to store my backups. `armbianmonitor -u` link: http://ix.io/2NYv Recently I have received errors in borg backup, about mismatching checksums. To confirm that this error is not a borg specific error, I wrote this little zsh script to write data and generate checksums. for ((i = 0; i < 100; i++)); do dd if=/dev/urandom bs=4M count=256 | tee $i | md5sum | sed '/^a/!s/.$/'"$i"'/' > $i.sum To be more precise, in each loop iteration I acquire 1GiB or random data using `dd`. `tee` writes it on the file system and to stdout which is piped in md5sum, which generates the checksum. The `sed` directive removes the `-` in the checksum file and replaces it with the appropriate filename. I then verify the checksums like so for ((i = 0; i < 100; i++)); do md5sum -c $i.sum; done a lot of the checks fail. My HDD structure looks as follows nanopim4v2:~:% lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 1 931.5G 0 disk └─md127 9:127 0 1.8T 0 raid5 └─md0_crypt 253:0 0 1.8T 0 crypt └─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid sdb 8:16 1 931.5G 0 disk └─md127 9:127 0 1.8T 0 raid5 └─md0_crypt 253:0 0 1.8T 0 crypt └─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid sdc 8:32 1 3.7T 0 disk ├─sdc1 8:33 1 931.5G 0 part │ └─md127 9:127 0 1.8T 0 raid5 │ └─md0_crypt 253:0 0 1.8T 0 crypt │ └─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid └─sdc2 8:34 1 2.7T 0 part └─sdc2_crypt 253:2 0 2.7T 0 crypt └─vg01_non_redundant-lv00 253:3 0 2.7T 0 lvm /media/non_redundant sdd 8:48 1 29.8G 0 disk └─sdd1 8:49 1 29.5G 0 part /home/frieder/prevfs mmcblk0 179:0 0 59.5G 0 disk └─mmcblk0p1 179:1 0 58.9G 0 part / zram0 251:0 0 1.9G 0 disk [SWAP] zram1 251:1 0 50M 0 disk /var/log So as you can see I also use lvm on luks on a second partition of sdc. So I tested the scripts on sdc2 and received no errors. According to mdadm my array is not in a broken state nanopim4v2:~:% cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md127 : active raid5 sdc1[3] sda[1] sdb[0] 1953260928 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> Also of note is, that I have tried check (`mdadm --action=check /dev/md127`) and repair (`mdadm --action=repair /dev/md127`) several times. I have then seen that I received a nonzero values (<200) in mismatchcnt (`/sys/block/md127/md/mismatch_cnt`) which I was able to get down to zero by repeatedly calling repair, but this number went up again after writing data and calling check again. I also tested the script on other hardware (with other drives, amd64 architecture, running Arch with kernel 5.10.11) and there I never received even a single failed checksum so I have reason to believe the script does what it should. To confirm that I don't just have a broken raid, I connected the drives to a amd64 computer running a live distribution of debian (debian-live-10.7.0-amd64-gnome.iso Kernel 4.19-lts) and repeated my checksum test. Here without fails. I also noted that the files and files+checksums I wrote from nanopi still mismatched on amd64, and the files+checksums I wrote from amd64 still matched on nanopi. So whatever is causing this error must only affect writing. I did not forget to check S.M.A.R.T. values, but I did not find anything suspicious there. To leave some readability I attached the values via pastebin: sda: https://pastebin.com/S06RfAgC sdb: https://pastebin.com/5TQYbWvu sdc: https://pastebin.com/i4esRqit As a final test, I booted a fresh image (Armbian_20.11.10_Nanopim4v2_buster_legacy_4.4.213_desktop.img), just installed the neccesary tools (mdadm cryptsetup and lvm2) and repeated the test. While I did not receive as many checksum mismatches as before, from 70*1GiB blocks checked, 4 still failed. Other possible relevant outputs: nanopim4v2:% sudo lvm version LVM version: 2.03.02(2) (2018-12-18) Library version: 1.02.155 (2018-12-18) Driver version: 4.34.0 Configuration: ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --libexecdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/aarch64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/aarch64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-dbus-service --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync nanopim4v2:% sudo cryptsetup --version cryptsetup 2.1.0 nanopim4v2:% sudo cryptsetup status md0_crypt /dev/mapper/md0_crypt is active and is in use. type: LUKS2 cipher: aes-xts-plain64 keysize: 512 bits key location: dm-crypt device: /dev/md127 sector size: 512 offset: 32768 sectors size: 3906489088 sectors mode: read/write nanopim4v2:% sudo mdadm -V mdadm - v4.1 - 2018-10-01 As a TL;DR: I believe there is something wrong with mdadm writes on armbian rk3399 legacy buster build. Can someone reproduce this behavior? If not do you have an (other) idea where this problem could be coming from, and what steps can I take to get a reliable working raid again? Sidenote: I would prefer to stay on lts for now as wiringpi is AFAIK not supported on newer kernels but of course if it is the only way, I am willing to switch to newer kernels. I appreciate any help and suggestions, cheers!
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines