Hello there,
I am using Armbian 20.11.6 Buster with Linux 4.4.213-rk3399 on a Nanopi M4V2, I have connected 3 SATA HDDs and use mdadm (for a RAID 5 Array of those drives), luks lvm and ext4 (in that order, more details below) to store my backups.
`armbianmonitor -u` link: http://ix.io/2NYv
Recently I have received errors in borg backup, about mismatching checksums.
To confirm that this error is not a borg specific error, I wrote this little zsh script to write data and generate checksums.
for ((i = 0; i < 100; i++)); do dd if=/dev/urandom bs=4M count=256 | tee $i | md5sum | sed '/^a/!s/.$/'"$i"'/' > $i.sum
To be more precise, in each loop iteration I acquire 1GiB or random data using `dd`. `tee` writes it on the file system and to stdout which is piped in md5sum, which generates the checksum. The `sed` directive removes the `-` in the checksum file and replaces it with the appropriate filename.
I then verify the checksums like so
for ((i = 0; i < 100; i++)); do md5sum -c $i.sum; done
a lot of the checks fail.
My HDD structure looks as follows
nanopim4v2:~:% lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 931.5G 0 disk
└─md127 9:127 0 1.8T 0 raid5
└─md0_crypt 253:0 0 1.8T 0 crypt
└─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid
sdb 8:16 1 931.5G 0 disk
└─md127 9:127 0 1.8T 0 raid5
└─md0_crypt 253:0 0 1.8T 0 crypt
└─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid
sdc 8:32 1 3.7T 0 disk
├─sdc1 8:33 1 931.5G 0 part
│ └─md127 9:127 0 1.8T 0 raid5
│ └─md0_crypt 253:0 0 1.8T 0 crypt
│ └─vg00-lv00_nas 253:1 0 1.8T 0 lvm /media/raid
└─sdc2 8:34 1 2.7T 0 part
└─sdc2_crypt 253:2 0 2.7T 0 crypt
└─vg01_non_redundant-lv00 253:3 0 2.7T 0 lvm /media/non_redundant
sdd 8:48 1 29.8G 0 disk
└─sdd1 8:49 1 29.5G 0 part /home/frieder/prevfs
mmcblk0 179:0 0 59.5G 0 disk
└─mmcblk0p1 179:1 0 58.9G 0 part /
zram0 251:0 0 1.9G 0 disk [SWAP]
zram1 251:1 0 50M 0 disk /var/log
So as you can see I also use lvm on luks on a second partition of sdc. So I tested the scripts on sdc2 and received no errors.
According to mdadm my array is not in a broken state
nanopim4v2:~:% cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid5 sdc1[3] sda[1] sdb[0]
1953260928 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
Also of note is, that I have tried check (`mdadm --action=check /dev/md127`) and repair (`mdadm --action=repair /dev/md127`) several times.
I have then seen that I received a nonzero values (<200) in mismatchcnt (`/sys/block/md127/md/mismatch_cnt`) which I was able to get down to zero by repeatedly calling repair, but this number went up again after writing data and calling check again.
I also tested the script on other hardware (with other drives, amd64 architecture, running Arch with kernel 5.10.11) and there I never received even a single failed checksum so I have reason to believe the script does what it should.
To confirm that I don't just have a broken raid, I connected the drives to a amd64 computer running a live distribution of debian (debian-live-10.7.0-amd64-gnome.iso Kernel 4.19-lts) and repeated my checksum test. Here without fails.
I also noted that the files and files+checksums I wrote from nanopi still mismatched on amd64, and the files+checksums I wrote from amd64 still matched on nanopi. So whatever is causing this error must only affect writing.
I did not forget to check S.M.A.R.T. values, but I did not find anything suspicious there. To leave some readability I attached the values via pastebin:
sda: https://pastebin.com/S06RfAgC sdb: https://pastebin.com/5TQYbWvu sdc: https://pastebin.com/i4esRqit
As a final test, I booted a fresh image (Armbian_20.11.10_Nanopim4v2_buster_legacy_4.4.213_desktop.img), just installed the neccesary tools (mdadm cryptsetup and lvm2) and repeated the test.
While I did not receive as many checksum mismatches as before, from 70*1GiB blocks checked, 4 still failed.
Other possible relevant outputs:
nanopim4v2:% sudo lvm version
LVM version: 2.03.02(2) (2018-12-18)
Library version: 1.02.155 (2018-12-18)
Driver version: 4.34.0
Configuration: ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --libexecdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/aarch64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/aarch64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-dbus-service --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync
nanopim4v2:% sudo cryptsetup --version
cryptsetup 2.1.0
nanopim4v2:% sudo cryptsetup status md0_crypt
/dev/mapper/md0_crypt is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/md127
sector size: 512
offset: 32768 sectors
size: 3906489088 sectors
mode: read/write
nanopim4v2:% sudo mdadm -V
mdadm - v4.1 - 2018-10-01
As a TL;DR: I believe there is something wrong with mdadm writes on armbian rk3399 legacy buster build.
Can someone reproduce this behavior? If not do you have an (other) idea where this problem could be coming from, and what steps can I take to get a reliable working raid again?
Sidenote: I would prefer to stay on lts for now as wiringpi is AFAIK not supported on newer kernels but of course if it is the only way, I am willing to switch to newer kernels.
I appreciate any help and suggestions, cheers!