qtmax Posted January 24, 2021 Posted January 24, 2021 I have a Helios4 (2nd batch) with 4 HDDs, and before I started using it, I decided to test its performance on the planned workload. However, I got quite not satisfied with the numbers. First, the SSH speed is about 30 MB/s, while the CPU core where sshd runs is at 100% (so I suppose it's the bottleneck). I create a sparse file and copy it over SSH: helios4# dd if=/dev/null bs=1M seek=2000 of=/tmp/blank With scp I get up to 31.2 MB/s: pc# scp root@helios4:/tmp/blank /tmp With rsync I get pretty much the same numbers: pc# rm /tmp/blank pc# rsync -e ssh --progress root@helios4:/tmp/blank /tmp/ blank 2,097,152,000 100% 31.06MB/s 0:01:04 (xfr#1, to-chk=0/1) (I tested both, because with another server, rsync over ssh runs faster - up to 80 MB/s, while scp is still at about 30 MB/s.) Am I'm planning to use sftp/ssh to access the files, such speed is absolutely unsatisfactory. Is it possible to tune it somehow to be much closer to the line rate of 1 Gbit/s? The second question is about the best RAID level for this setup. My HDD's read speed is about 250 MB/s, and if I combine 4 of them into a RAID0, the read speed reaches 950 MB/s on another computer, but when I put these disks into the Helios4, the max read speed is about 350 MB/s. Given that the network speed is 125 MB/s, it doesn't seem useful to use any forms of striping, such as RAID0, RAID10 (far), etc., does it? What is the recommended RAID configuration for this system then, given that I have 4 HDDs? Would RAID10 (near) be better than RAID10 (far)? RAID10 (far) should provide the same read speed as RAID0 for sequential read, however, as on Helios4 the speed is far from 4x (it's just 1.4x), and the network line rate is anyway lower, maybe it makes more sense to optimize for parallel reads. RAID10 (near) will occupy only 2 disks on a single-streamed read (as opposed to 4 with RAID10 (far)), so a second reading stream can use two other disks and improve performance by avoiding seeking to different locations on a single disk. Does this layout make sense, or am I missing something? I'm also considering RAID6 that has better redundancy than RAID10 at the cost of lower speed and higher CPU usage. While lower speed might not be a problem if it's still higher than the line rate of 1 Gbit/s, CPU may be the bottleneck. As I see from the SSH test above, even such a simple test occupies 100% of one core and some part of the other core, so running RAID6 may increase the CPU demands and cause performance decrease. One more option is RAID1 over LVM that would combine pairs of disks, but I don't see any apparent advantage over RAID10 (near). Does anyone have experience running RAID10 (near/far), RAID6 or maybe other configurations with Helios4? What would you advise to build out of 4 HDDs 8 TB each if my goal is to have redundancy and read/write speeds close to the line rate? 1
Igor Posted January 24, 2021 Posted January 24, 2021 32 minutes ago, qtmax said: such speed is absolutely unsatisfactory Agree, but on 3000+ USD NAS, things ain't much better Client is high-end workstation running Mint, 10Gb connection ... 138.4MB/s 224.29MB/s ... while NFS transfer speed is normal: 33 minutes ago, qtmax said: Is it possible to tune it somehow to be much closer to the line rate of 1 Gbit/s? IMO hardware performance is just fine.
lanefu Posted January 25, 2021 Posted January 25, 2021 Id focus on tuning your means of transfer to your helios4 rather disk tunning. SSH has never been good for raw throughput. You can tune ciphers and compression to get better performance. Native rsync rather than rsync over ssh is also more performant. With rsync there is also tuning opportunities. Given 4 drives. I recommed against raid6. Id choose raid5 for capacity or raid10 if you really intend to have a lot of concurrent disk IO. What type of data are you trying to protect. (If just large video files standard advise is to forgo raid and use snap raid instead) How often and how many devices do you expect writes?
gprovost Posted January 25, 2021 Posted January 25, 2021 @qtmax You can refer to our wiki page on CESA engine to accelerate crypto : https://wiki.kobol.io/helios4/cesa/ There is also an old thread on that topic : Unfortunately it might not be of big help because as stated on our wiki here by default you cannot use CESA engine for OpenSSH for Debian Buster while it is possible under Debian Stretch. However it seems possible if you recompile OpenSSH to disable sandboxing. https://forum.doozan.com/read.php?2,89514 The above forum link, which is more up to date than our wiki, give some instruction on how to use cryptodev on Buster, but yeah it's a bit experimental... worth a try. Regarding RAID, I would go for RAID10 if you are looking a the best balance between redundancy and cpu load + resync time.
qtmax Posted January 25, 2021 Author Posted January 25, 2021 Thanks for your replies! For some reason this forum rate limits me down to one message a day. Is there a way to lift this limitation? I see now that many people recommend against SSH. I think it's a combination of implementation drawbacks (because strong hardware also has poor SSH speed) and weak hardware (because I still can reach higher speeds with two x86 machines). Opportunities to try are CESA (although cryptodev looks highly experimental and requires manual selection of ciphers, and the numbers in the linked thread are still far from perfect) and HPN-SSH (some patchset that I found). What protocol would you suggest instead? I need encrypted (no FTP?) data transfer at a rate close to 1 Gbit/s (no SSH?), it should be easy to connect from linux over internet (no NFS?), and have no stupid limitations on the allowed characters in the file names (no SMB?). Does webdav over https sounds better, or are there better options? 23 часа назад, Igor сказал: IMO hardware performance is just fine. @Igor, I did some RAID read/write tests yesterday, and I found a serious performance regression between kernel 5.8.18 and 5.9.14 (currently bisecting). The read speed improved, but the write speed is garbage with kernel 5.9. Here is the script that I used for tests, and here are the performance numbers that I got: root@helios4 ~ # uname -r 5.8.18-mvebu root@helios4 ~ # ./test-raid.sh Direct: read: 255 MB/s write: 221 MB/s RAID0: read: 284 MB/s write: 229 MB/s RAID10 far: read: 264 MB/s write: 181 MB/s RAID10 near: read: 272 MB/s write: 184 MB/s RAID6: read: 289 MB/s write: 116 MB/s root@helios4 ~ # uname -r 5.9.14-mvebu root@helios4 ~ # ./test-raid.sh Direct: read: 256 MB/s write: 145 MB/s RAID0: read: 396 MB/s write: 107 MB/s RAID10 far: read: 321 MB/s write: 62.5 MB/s RAID10 near: read: 355 MB/s write: 62.2 MB/s RAID6: read: 387 MB/s write: 62.1 MB/s The write speed in all tests (even directly writing to a single disk) has severely dropped. Even RAID0 write became slower than the direct write. At the same time, the read speed in RAID tests increased, about which I can't care less as long as it's bigger than 125 MB/s. The issue is not fixed with kernel 5.10. Do you have any idea how to get back the old performance with new kernels? 60 MB/s would be a bottleneck even with a fast protocol on the network. Also, I'm quite surprised to see that RAID10 (near) is faster (both read and write) than RAID10 (far). Far should be 2x faster than near on read (reading from 4 disks instead of 2), and write speeds should be about the same. Even more I'm surprised about RAID6, which has the fastest read, and, as expected, the slowest write (although the XOR offload is in use). Why would RAID6 read be faster than, for example, RAID10 (near) read if they both utilize two disks? Does anyone has an insight that could explain this weirdness? 14 часов назад, lanefu сказал: What type of data are you trying to protect. (If just large video files standard advise is to forgo raid and use snap raid instead) How often and how many devices do you expect writes? I'll have mixed data of three main types: large files written once and very rarely read, smaller (a few MBs) files written once and occasionally read, a lot of tiny files accessed often (both read and write). I'd like to optimize for the first two use cases (tiny files are tiny enough, so that the speed doesn't matter much), while both write and read speeds matter for me, even though write happens only once. It'll be single-stream most of the time, although two concurrent writers are possible, but should be rare. Also thanks for the SnapRAID suggestion, this is something entirely new to me. 1
gprovost Posted January 26, 2021 Posted January 26, 2021 @qtmax Ok let us reproduce that on our side and confirm your observations.
qtmax Posted January 26, 2021 Author Posted January 26, 2021 I found the offending commit with bisect: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ff9f19231a0e670b3d79c563f1b0b185abeca91 Reverting it on 5.10.10 restores the write performance, while not hurting read performance (i.e. keeps it high). I'm going to submit the revert to upstream very soon — it would be great if it also could be applied as a patch to armbian. (I would post a link here after I submit the patch, but this forum won't allow me to post more than one message a day — is there any way to lift this limitation?) @gprovost, could you reproduce the bug on your NAS? 2
Mangix Posted January 27, 2021 Posted January 27, 2021 A more performant alternative to AF_ALG... https://github.com/cotequeiroz/afalg_engine Not a solution to your issues unfortunately. IIRC, there was a way to overclock the helios to 2.0ghz or 1.8ghz. Don't remember the details. 1
gprovost Posted January 29, 2021 Posted January 29, 2021 @qtmax Yes i observe the same thing with your test script between LK 5.8 and 5.9. That's super great that you managed to pin point the upstream change that is the root cause. I haven't had the time to test the patch though. You could just rise a PR to add the revert patch here : https://github.com/armbian/build/tree/master/patch/kernel/mvebu-current What do you think @Heisath ? root@helios4:~# uname -r 5.9.14-mvebu root@helios4:~# bash test_script Direct: read: 361 MB/s write: 120 MB/s RAID0: read: 399 MB/s write: 92.4 MB/s RAID10 far: read: 407 MB/s write: 49.0 MB/s RAID10 near: read: 398 MB/s write: 50.3 MB/s RAID6: read: 396 MB/s write: 51.6 MB/s ----------------------------------------- root@helios4:~# uname -r 5.8.18-mvebu root@helios4:~# bash test_script Direct: read: 328 MB/s write: 146 MB/s RAID0: read: 335 MB/s write: 237 MB/s RAID10 far: read: 337 MB/s write: 130 MB/s RAID10 near: read: 335 MB/s write: 153 MB/s RAID6: read: 335 MB/s write: 87.8 MB/s
qtmax Posted January 29, 2021 Author Posted January 29, 2021 My upstream patch was accepted to 5.11: https://www.spinics.net/lists/linux-block/msg64471.html (there is also a summary table with performance numbers before/after the offending commit and before/after my revert patch). I'll push the same patch to your github, so that we could have it in armbian earlier.
Heisath Posted January 30, 2021 Posted January 30, 2021 Great that you found & fixed it so quickly. If you add the patch via PR please include a mention (in the patch) about the upstream fixed version, so we can easily identify and remember to remove the patch again once we reach 5.11+
SvenHz Posted February 16, 2021 Posted February 16, 2021 So I'm just curious, what's the status of this fix? Do we need to wait till 5.11 lands in Armbian or has it been fixed (temporarily) downstream in Armbian's 5.10.x code? Great find by the way, thanks.
Heisath Posted February 16, 2021 Posted February 16, 2021 This has been fixed via a patch to mvebu-current (LK5.10) so should be available via nightly or with the next armbian release : https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/0001-Revert-block-simplify-set_init_blocksize-to-regain-l.patch It is fixed in LK 5.11 in mainline, but unsure if Armbian will get LK5.11 on mvebu or if we wait until the next LTS kernel version. 2
Recommended Posts