zfs read vs. write performance


Recommended Posts

Hi,

 

i have successfully installed my new helios64 and enabled zfs. Everythings works so far, but it seems I'm missing something "important" in regards to ZFS.

 

I did the following tests:

I took one of my WD 4TB Plus drives (in slot 3) and formatted the disk with a) ext4 (mkfs.ext4 /dev/sdc) b) btrfs (mkfs.btrfs /dev/sdc) and c) zfs (zpool create test /dev/sdc).

Based on these 3 formats I did a "dd if=/dev/zero of=test2.bin bs=1GB count=5" to measure write performance and a "dd if=test2.bin of=/dev/null" to get the read performance (all tests were done 3 times, results averaged)

 

The results I got look like this:

a) ext4 ... avg write 165MB/s - read 180MB/s

b) btrfs ... avg write 180MB/s - read 192MB/s

c) zfs ... avg write 140MB/s - read 45MB/s (!)

 

So while ext4 and btrfs produce for sequential read / write pretty much the same (and expected) results, zfs seems to be significantly slower in write and dramaticially slower in read performance. 

 

I need to admit, I'm new to zfs ... but is this expected / normal? 

zfs has tonns of parameters to tune performance - are there any settings which are "needed" to achive almost the same values as e.g. with btrfs?
(honestly, I expected to perform zfs equally to btrfs - as zfs should be the more "mature" file system ...)

 

thanks!
A

Link to post
Share on other sites
Donate and support the project!

For spinning disks, you should use ashift 12 (and ashift 13 for SSDs) when creating your ZFS pool. It can't be changed after the fact and should match the physical block size of your HDD (12 = 4096). ZFS read speed also benefits from additional disks, if you e.g. created a mirror across two disks or say a RAIDZ1 across 5 disks, you should see pretty good performance. Also, to make the above test unfair (in favor of ZFS), you could also enable compression (-O compression=lz4).

zpool create -o ashift=12 ...

 

Personally I use:

zpool create \
        -o ashift=12 \
        -O acltype=posixacl -O canmount=off -O compression=lz4 \
        -O dnodesize=auto -O normalization=formD -O relatime=on \
        -O xattr=sa \
        ...

 

Link to post
Share on other sites

Thank you for your answer.

I know about ashift=12, my tests have been made with this setting. I also tried your settings (without compression!), but they make no real difference. 

Using three disks brings things up to ~80MB/s - which is still unacceptable. 

 

But I think I found my problem:

It is actually related to the blocksize - but not the blocksize of the zfs pool - but the blocksize dd is using:

 

This is what I'm usually doing:
 

root@helios64:/test# dd if=test2.bin of=/dev/null
9765625+0 records in
9765625+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 107.466 s, 46.5 MB/s

 

DD is using a blocksize of 512 here. For ext4 / btrfs this seems to be no issue - but zfs has a topic with this. 
If I use explicity a blocksize of 4096, I get these results:

 

root@helios64:/test# dd if=test2.bin of=/dev/null bs=4096
1220703+1 records in
1220703+1 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 27.4704 s, 182 MB/s

 

Which gives the figures expected!

 

So as so often: "Layer 8 problem"  - the problem sits in front of the screen. 

Edited by Alexander Eiblinger
Link to post
Share on other sites

Read speed are quite hard to measure. If you test the read speed of the same file multiple times, ZFS will cache it.
I'm getting up to 1GBps read speed after reading the same files 4 times in a row. Your file might also be cached if you just wrote it.

By the way if someone knows how to flush that cache that would be helpful to run tests.

Link to post
Share on other sites
vor 23 Stunden schrieb tionebrr:

Read speed are quite hard to measure. If you test the read speed of the same file multiple times, ZFS will cache it.
I'm getting up to 1GBps read speed after reading the same files 4 times in a row. Your file might also be cached if you just wrote it.
 

 

That's why I wrote / read 5 GB ... the Helios64 has "only" 4 GB RAM, so if 5 GB are read / writen the cache should be have no useable copy of the data.

Link to post
Share on other sites