Jump to content

Helios4 performance concerns


qtmax

Recommended Posts

I have a Helios4 (2nd batch) with 4 HDDs, and before I started using it, I decided to test its performance on the planned workload. However, I got quite not satisfied with the numbers.

 

First, the SSH speed is about 30 MB/s, while the CPU core where sshd runs is at 100% (so I suppose it's the bottleneck).

 

I create a sparse file and copy it over SSH:

helios4# dd if=/dev/null bs=1M seek=2000 of=/tmp/blank

With scp I get up to 31.2 MB/s:

pc# scp root@helios4:/tmp/blank /tmp

With rsync I get pretty much the same numbers:

pc# rm /tmp/blank
pc# rsync -e ssh --progress root@helios4:/tmp/blank /tmp/
blank
  2,097,152,000 100%   31.06MB/s    0:01:04 (xfr#1, to-chk=0/1)

(I tested both, because with another server, rsync over ssh runs faster - up to 80 MB/s, while scp is still at about 30 MB/s.)

 

Am I'm planning to use sftp/ssh to access the files, such speed is absolutely unsatisfactory. Is it possible to tune it somehow to be much closer to the line rate of 1 Gbit/s?

 

The second question is about the best RAID level for this setup. My HDD's read speed is about 250 MB/s, and if I combine 4 of them into a RAID0, the read speed reaches 950 MB/s on another computer, but when I put these disks into the Helios4, the max read speed is about 350 MB/s. Given that the network speed is 125 MB/s, it doesn't seem useful to use any forms of striping, such as RAID0, RAID10 (far), etc., does it? What is the recommended RAID configuration for this system then, given that I have 4 HDDs?

 

Would RAID10 (near) be better than RAID10 (far)? RAID10 (far) should provide the same read speed as RAID0 for sequential read, however, as on Helios4 the speed is far from 4x (it's just 1.4x), and the network line rate is anyway lower, maybe it makes more sense to optimize for parallel reads. RAID10 (near) will occupy only 2 disks on a single-streamed read (as opposed to 4 with RAID10 (far)), so a second reading stream can use two other disks and improve performance by avoiding seeking to different locations on a single disk. Does this layout make sense, or am I missing something?

 

I'm also considering RAID6 that has better redundancy than RAID10 at the cost of lower speed and higher CPU usage. While lower speed might not be a problem if it's still higher than the line rate of 1 Gbit/s, CPU may be the bottleneck. As I see from the SSH test above, even such a simple test occupies 100% of one core and some part of the other core, so running RAID6 may increase the CPU demands and cause performance decrease.

 

One more option is RAID1 over LVM that would combine pairs of disks, but I don't see any apparent advantage over RAID10 (near).

 

Does anyone have experience running RAID10 (near/far), RAID6 or maybe other configurations with Helios4? What would you advise to build out of 4 HDDs 8 TB each if my goal is to have redundancy and read/write speeds close to the line rate?

Link to comment
Share on other sites

32 minutes ago, qtmax said:

such speed is absolutely unsatisfactory

 

Agree, but on 3000+ USD NAS, things ain't much better :mellow: Client is high-end workstation running Mint, 10Gb connection ...
138.4MB/s

224.29MB/s

 

... while NFS transfer speed is normal:

DVyINz.jpg

 

33 minutes ago, qtmax said:

Is it possible to tune it somehow to be much closer to the line rate of 1 Gbit/s?


IMO hardware performance is just fine. 

Link to comment
Share on other sites

Id focus on tuning your means of transfer to your helios4 rather disk tunning.

 

SSH has never been good for raw throughput.  You can tune ciphers and compression to get better performance.   

 

Native rsync rather than rsync over ssh is also more performant.

 

With rsync there is also tuning opportunities.

 

Given 4 drives.  I recommed against raid6.  Id choose raid5 for capacity or raid10 if you really intend to have a lot of concurrent disk IO.

 

What type of data are you trying to protect.  (If just large video files standard advise is to forgo raid and use snap raid instead)

 

How often and how many devices do you expect writes?

 

 

Link to comment
Share on other sites

@qtmax

 

You can refer to our wiki page on CESA engine to accelerate crypto : https://wiki.kobol.io/helios4/cesa/

 

There is also an old thread on that topic :

 

Unfortunately it might not be of big help because as stated on our wiki here by default you cannot use CESA engine for OpenSSH for Debian Buster while it is possible under Debian Stretch. However it seems possible if you recompile OpenSSH to disable sandboxing. https://forum.doozan.com/read.php?2,89514

 

The above forum link, which is more up to date than our wiki, give some instruction on how to use cryptodev on Buster, but yeah it's a bit experimental... worth a try.

 

Regarding RAID, I would go for RAID10 if you are looking a the best balance between redundancy and cpu load + resync time.

 

 

 

 

 

Link to comment
Share on other sites

Thanks for your replies! For some reason this forum rate limits me down to one message a day. Is there a way to lift this limitation?

 

I see now that many people recommend against SSH. I think it's a combination of implementation drawbacks (because strong hardware also has poor SSH speed) and weak hardware (because I still can reach higher speeds with two x86 machines). Opportunities to try are CESA (although cryptodev looks highly experimental and requires manual selection of ciphers, and the numbers in the linked thread are still far from perfect) and HPN-SSH (some patchset that I found).

 

What protocol would you suggest instead? I need encrypted (no FTP?) data transfer at a rate close to 1 Gbit/s (no SSH?), it should be easy to connect from linux over internet (no NFS?), and have no stupid limitations on the allowed characters in the file names (no SMB?). Does webdav over https sounds better, or are there better options?

 

23 часа назад, Igor сказал:

IMO hardware performance is just fine. 

 

@Igor, I did some RAID read/write tests yesterday, and I found a serious performance regression between kernel 5.8.18 and 5.9.14 (currently bisecting). The read speed improved, but the write speed is garbage with kernel 5.9.

 

Here is the script that I used for tests, and here are the performance numbers that I got:

 

root@helios4 ~ # uname -r
5.8.18-mvebu
root@helios4 ~ # ./test-raid.sh
Direct:
  read: 255 MB/s
  write: 221 MB/s
RAID0:
  read: 284 MB/s
  write: 229 MB/s
RAID10 far:
  read: 264 MB/s
  write: 181 MB/s
RAID10 near:
  read: 272 MB/s
  write: 184 MB/s
RAID6:
  read: 289 MB/s
  write: 116 MB/s

root@helios4 ~ # uname -r
5.9.14-mvebu
root@helios4 ~ # ./test-raid.sh
Direct:
  read: 256 MB/s
  write: 145 MB/s
RAID0:
  read: 396 MB/s
  write: 107 MB/s
RAID10 far:
  read: 321 MB/s
  write: 62.5 MB/s
RAID10 near:
  read: 355 MB/s
  write: 62.2 MB/s
RAID6:
  read: 387 MB/s
  write: 62.1 MB/s

The write speed in all tests (even directly writing to a single disk) has severely dropped. Even RAID0 write became slower than the direct write. At the same time, the read speed in RAID tests increased, about which I can't care less as long as it's bigger than 125 MB/s.

 

The issue is not fixed with kernel 5.10. Do you have any idea how to get back the old performance with new kernels? 60 MB/s would be a bottleneck even with a fast protocol on the network.

 

Also, I'm quite surprised to see that RAID10 (near) is faster (both read and write) than RAID10 (far). Far should be 2x faster than near on read (reading from 4 disks instead of 2), and write speeds should be about the same.

 

Even more I'm surprised about RAID6, which has the fastest read, and, as expected, the slowest write (although the XOR offload is in use). Why would RAID6 read be faster than, for example, RAID10 (near) read if they both utilize two disks?

 

Does anyone has an insight that could explain this weirdness?

 

14 часов назад, lanefu сказал:

What type of data are you trying to protect.  (If just large video files standard advise is to forgo raid and use snap raid instead)

 

How often and how many devices do you expect writes?

 

I'll have mixed data of three main types: large files written once and very rarely read, smaller (a few MBs) files written once and occasionally read, a lot of tiny files accessed often (both read and write). I'd like to optimize for the first two use cases (tiny files are tiny enough, so that the speed doesn't matter much), while both write and read speeds matter for me, even though write happens only once. It'll be single-stream most of the time, although two concurrent writers are possible, but should be rare.

 

Also thanks for the SnapRAID suggestion, this is something entirely new to me.

Link to comment
Share on other sites

I found the offending commit with bisect: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ff9f19231a0e670b3d79c563f1b0b185abeca91

 

Reverting it on 5.10.10 restores the write performance, while not hurting read performance (i.e. keeps it high).

 

I'm going to submit the revert to upstream very soon — it would be great if it also could be applied as a patch to armbian.

 

(I would post a link here after I submit the patch, but this forum won't allow me to post more than one message a day — is there any way to lift this limitation?)

 

@gprovost, could you reproduce the bug on your NAS?

Link to comment
Share on other sites

@qtmax Yes i observe the same thing with your test script between LK 5.8 and 5.9. That's super great that you managed to pin point the upstream change that is the root cause.  I haven't had the time to test the patch though.

 

You could just rise a PR to add the revert patch here : https://github.com/armbian/build/tree/master/patch/kernel/mvebu-current

What do you think @Heisath ?

 

root@helios4:~# uname -r
5.9.14-mvebu

root@helios4:~# bash test_script 
Direct:
  read: 361 MB/s
  write: 120 MB/s
RAID0:
  read: 399 MB/s
  write: 92.4 MB/s
RAID10 far:
  read: 407 MB/s
  write: 49.0 MB/s
RAID10 near:
  read: 398 MB/s
  write: 50.3 MB/s
RAID6:
  read: 396 MB/s
  write: 51.6 MB/s

-----------------------------------------

root@helios4:~# uname -r
5.8.18-mvebu

root@helios4:~# bash test_script 
Direct:
  read: 328 MB/s
  write: 146 MB/s
RAID0:
  read: 335 MB/s
  write: 237 MB/s
RAID10 far:
  read: 337 MB/s
  write: 130 MB/s
RAID10 near:
  read: 335 MB/s
  write: 153 MB/s
RAID6:
  read: 335 MB/s
  write: 87.8 MB/s

 

Link to comment
Share on other sites

Great that you found & fixed it so quickly. If you add the patch via PR please include a mention (in the patch) about the upstream fixed version, so we can easily identify and remember to remove the patch again once we reach 5.11+

Link to comment
Share on other sites

This has been fixed via a patch to mvebu-current (LK5.10) so should be available via nightly or with the next armbian release : https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/0001-Revert-block-simplify-set_init_blocksize-to-regain-l.patch

It is fixed in LK 5.11 in mainline, but unsure if Armbian will get LK5.11 on mvebu or if we wait until the next LTS kernel version.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines