Alexander Eiblinger Posted December 26, 2020 Posted December 26, 2020 Hi, i have successfully installed my new helios64 and enabled zfs. Everythings works so far, but it seems I'm missing something "important" in regards to ZFS. I did the following tests: I took one of my WD 4TB Plus drives (in slot 3) and formatted the disk with a) ext4 (mkfs.ext4 /dev/sdc) b) btrfs (mkfs.btrfs /dev/sdc) and c) zfs (zpool create test /dev/sdc). Based on these 3 formats I did a "dd if=/dev/zero of=test2.bin bs=1GB count=5" to measure write performance and a "dd if=test2.bin of=/dev/null" to get the read performance (all tests were done 3 times, results averaged) The results I got look like this: a) ext4 ... avg write 165MB/s - read 180MB/s b) btrfs ... avg write 180MB/s - read 192MB/s c) zfs ... avg write 140MB/s - read 45MB/s (!) So while ext4 and btrfs produce for sequential read / write pretty much the same (and expected) results, zfs seems to be significantly slower in write and dramaticially slower in read performance. I need to admit, I'm new to zfs ... but is this expected / normal? zfs has tonns of parameters to tune performance - are there any settings which are "needed" to achive almost the same values as e.g. with btrfs? (honestly, I expected to perform zfs equally to btrfs - as zfs should be the more "mature" file system ...) thanks! A 0 Quote
ShadowDance Posted December 26, 2020 Posted December 26, 2020 For spinning disks, you should use ashift 12 (and ashift 13 for SSDs) when creating your ZFS pool. It can't be changed after the fact and should match the physical block size of your HDD (12 = 4096). ZFS read speed also benefits from additional disks, if you e.g. created a mirror across two disks or say a RAIDZ1 across 5 disks, you should see pretty good performance. Also, to make the above test unfair (in favor of ZFS), you could also enable compression (-O compression=lz4). zpool create -o ashift=12 ... Personally I use: zpool create \ -o ashift=12 \ -O acltype=posixacl -O canmount=off -O compression=lz4 \ -O dnodesize=auto -O normalization=formD -O relatime=on \ -O xattr=sa \ ... 0 Quote
Alexander Eiblinger Posted December 27, 2020 Author Posted December 27, 2020 (edited) Thank you for your answer. I know about ashift=12, my tests have been made with this setting. I also tried your settings (without compression!), but they make no real difference. Using three disks brings things up to ~80MB/s - which is still unacceptable. But I think I found my problem: It is actually related to the blocksize - but not the blocksize of the zfs pool - but the blocksize dd is using: This is what I'm usually doing: root@helios64:/test# dd if=test2.bin of=/dev/null 9765625+0 records in 9765625+0 records out 5000000000 bytes (5.0 GB, 4.7 GiB) copied, 107.466 s, 46.5 MB/s DD is using a blocksize of 512 here. For ext4 / btrfs this seems to be no issue - but zfs has a topic with this. If I use explicity a blocksize of 4096, I get these results: root@helios64:/test# dd if=test2.bin of=/dev/null bs=4096 1220703+1 records in 1220703+1 records out 5000000000 bytes (5.0 GB, 4.7 GiB) copied, 27.4704 s, 182 MB/s Which gives the figures expected! So as so often: "Layer 8 problem" - the problem sits in front of the screen. Edited December 27, 2020 by Alexander Eiblinger 1 Quote
tionebrr Posted December 27, 2020 Posted December 27, 2020 Read speed are quite hard to measure. If you test the read speed of the same file multiple times, ZFS will cache it. I'm getting up to 1GBps read speed after reading the same files 4 times in a row. Your file might also be cached if you just wrote it. By the way if someone knows how to flush that cache that would be helpful to run tests. 0 Quote
yay Posted December 28, 2020 Posted December 28, 2020 sync; echo 3 > /proc/sys/vm/drop_caches Check Documentation/sysctl/vm.txt for details on drop_caches. 1 Quote
Alexander Eiblinger Posted December 28, 2020 Author Posted December 28, 2020 vor 23 Stunden schrieb tionebrr: Read speed are quite hard to measure. If you test the read speed of the same file multiple times, ZFS will cache it. I'm getting up to 1GBps read speed after reading the same files 4 times in a row. Your file might also be cached if you just wrote it. That's why I wrote / read 5 GB ... the Helios64 has "only" 4 GB RAM, so if 5 GB are read / writen the cache should be have no useable copy of the data. 0 Quote
tionebrr Posted December 28, 2020 Posted December 28, 2020 45 minutes ago, yay said: sync; echo 3 > /proc/sys/vm/drop_caches Check Documentation/sysctl/vm.txt for details on drop_caches. So if I understand correctly: the memory caching I've seen is not actually managed by ZFS but by the kernel? 0 Quote
tionebrr Posted December 28, 2020 Posted December 28, 2020 41 minutes ago, Alexander Eiblinger said: That's why I wrote / read 5 GB ... the Helios64 has "only" 4 GB RAM, so if 5 GB are read / writen the cache should be have no useable copy of the data. Good trick ;p 0 Quote
ploubi Posted March 7, 2021 Posted March 7, 2021 Hello, I've similar issues, on my helios64 and I don't see how to correct it. I'm trying to replace my classic 4 disks raid5 ext4 with by a equivalent raidz1 What I've done so far: 1 - Added a 8TB on my 5th slot and format it with btrfs 2 - back up all my data on it with rsync, I got an avg write speed of 105-110MB/s on big files (HD movies, usually around 10-15GB each) which is really OK for me as my NAS sits on 1Gb network, more isn't necessary. 3 - create the pool the way it's described on helios NFS tutorial (I've since created a few datasets with different recordsize) 4 - rsync back my data on the new raidz And I do get very poor results Initially I did get around 35 MB/s, after tinkering a lot with recordsize (now it's 1M) and stuff I managed to go up to 55 MB/s, but it's still way too slow to my taste I've benchmarked a few uses cases, with a single 5GB file and here are the results Seems like when rsynch worked on only one file, it goes slightly faster (74MB/s) raidz -> btrfs root@helios64:/mypool/video# rsync -av --progress pipot/ /srv/dev-disk-by-id-ata-ST8000NM0055-1RM112_ZA1K2SN4-part1/backup/video/pipot/ sending incremental file list ... sent 4,916,400,124 bytes received 138 bytes 95,464,082.76 bytes/sec total size is 4,915,200,000 speedup is 1.00 btrf -> raidz root@helios64:/mypool/video# rsync -av --progress /srv/dev-disk-by-id-ata-ST8000NM0055-1RM112_ZA1K2SN4-part1/backup/video/pipot/ ./pipot2/ sending incremental file list ... sent 4,916,400,124 bytes received 69 bytes 73,930,829.97 bytes/sec total size is 4,915,200,000 speedup is 1.00 raidz -> raidz root@helios64:/mypool/video# rsync -av --progress pipot/ /pipot3/ sending incremental file list ... sent 4,916,400,124 bytes received 68 bytes 57,501,756.63 bytes/sec total size is 4,915,200,000 speedup is 1.00 btrfs -> btrfs root@helios64:/mypool/video# rsync -av --progress /srv/dev-disk-by-id-ata-ST8000NM0055-1RM112_ZA1K2SN4-part1/backup/video/pipot/ /srv/dev-disk-by-id-ata-ST8000NM0055-1RM112_ZA1K2SN4-part1/backup/video/pipot2/ ... sent 4,916,400,124 bytes received 139 bytes 68,760,842.84 bytes/sec total size is 4,915,200,000 speedup is 1.00 I've tried messing with dd, I get results that match Alexander's, it really seems to be a read speed as when I created the original testfile from /dev/zero, the speed maxes out at around 32k Creating the testfile from /dev/zero root@helios64:/mypool/video# dd if=/dev/zero of=testfile1 count=150000 bs=32k status=progress 4791074816 octets (4,8 GB, 4,5 GiB) copiés, 12 s, 399 MB/s 150000+0 enregistrements lus 150000+0 enregistrements écrits 4915200000 octets (4,9 GB, 4,6 GiB) copiés, 12,2836 s, 400 MB/s just coping the file bs 512 (I interptuded it before it finished) root@helios64:/mypool/video# dd if=testfile1 of=testfile2 bs=512 status=progress 17506304 octets (18 MB, 17 MiB) copiés, 24 s, 729 kB/s bs 4k -> 5,8 MB/s bs 8k -> 10,9 MB/s bs 16k -> 22,8 MB/s bs 32k -> 43,5 MB/s bs 64k -> 76,0 MB/s bs 128k -> 129 MB/s bs 512k -> 264 MB/s bs 1M -> 371 MB/s bs 4M -> 397 MB/s So how can I make a rsync (or cp for that matter), perform the same way i dd? Thanks 0 Quote
gprovost Posted March 8, 2021 Posted March 8, 2021 @ShadowDance Since you are giving good recommendation on ZFS, I wonder if there should be some clear recommendation on the pro/con of using ZFS compression and/or deduplication on system with limited RAM ? 0 Quote
wurmfood Posted March 9, 2021 Posted March 9, 2021 Definitely don't use dedupe. Compression seems to work fine for me, but most of my data so far isn't very compressible. When I've done tests using things that are mostly text it works out well. For most things that are binary or encrypted, it doesn't seem to do much. I'm getting a compressratio of only 1.01 using either zstd or lz4 on most of my datasets. If you know you're going to have places that are going to benefit from compression, use it there, but I'm tempted to turn it off for a lot of my stuff (media, encrypted backups). 0 Quote
ShadowDance Posted March 9, 2021 Posted March 9, 2021 (edited) @gprovost regarding compression there shouldn't be any RAM constraints that need to be considered, ZFS compression operates in recordsize'd chunks (i.e. between 4KiB and 1MiB with ashift=12). Personally I think it's a good idea to set LZ4 as the default compression for the entire pool and then adjust on a per-dataset basis where needed. LZ4 is very cheap on CPU performance and can give some easy savings (see below). I would not advice to use Gzip as the CPU overhead is quite significant, if higher compression is required, OpenZFS 2.0+ with Zstandard (zstd) compression might be a better alternative as it can achieve Gzip level compression at much lower cost of (de)compression. As @wurmfood pointed out, however, it's quite rare that media or encrypted data would benefit from compression, the only exception is when that data contains padding/zeroes, which would compress well under LZ4. So that's a good example of when not to use compression and also keep in mind that disabling compression will not decompress existing data, there could exist uncompressed, LZ4-compressed, Gzip-compressed, etc data mixed under one dataset. A full rewrite of all data is needed to change the compression of all the data. As for deduplication, unless the zpool is extremely small, dedup is not really an option on the Helios64 as it requires very large amounts of RAM. The man page recommends at least 1.25 GiB of RAM per 1 TiB of storage. For reference, some of my space savings from using compression: NAME USED COMPRESS RATIO rpool/ROOT/debian 15.5G lz4 1.73x rpool/data/service 18.7G lz4 2.08x rpool/data/service/avahi 208K lz4 1.00x rpool/data/service/grafana 48.4M lz4 1.83x rpool/data/service/loki 736K lz4 1.00x rpool/data/service/mariadb 676M lz4 3.25x rpool/data/service/mariadb/db 214M lz4 1.88x rpool/data/service/mariadb/dump 455M gzip 3.88x rpool/data/service/prometheus 10.9G lz4 2.29x rpool/data/service/promtail 768K lz4 1.00x rpool/data/service/samba 2.73M lz4 6.04x rpool/data/service/unifi 3.41G lz4 1.34x rpool/data/service/unifi/db 2.38M lz4 1.00x rpool/var/log 18.0G lz4 2.26x Edited March 9, 2021 by ShadowDance Add man page recommendation about dedup RAM usage 0 Quote
gprovost Posted March 10, 2021 Posted March 10, 2021 4 hours ago, ShadowDance said: As for deduplication, unless the zpool is extremely small, dedup is not really an option on the Helios64 as it requires very large amounts of RAM. The man page recommends at least 1.25 GiB of RAM per 1 TiB of storage. Yes exactly and I think it's important to recommend to disable it at pool creation. 0 Quote
wurmfood Posted March 11, 2021 Posted March 11, 2021 21 hours ago, gprovost said: Yes exactly and I think it's important to recommend to disable it at pool creation. I believe it's not enabled by default. You have to choose to set it. At least, none of my pools have been created with it and I never specified not to. As a side note, I doubled checked my compression on all of my datasets and I noticed that some of my docker data sees massive compression with zstd on. Most are in the 1-3x range, but I have several in the 8-9x range. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.