tkaiser Posted April 3, 2017 Share Posted April 3, 2017 Anyone ever dealt with OpenMediaVault (OMV) on SBC? Great piece of software but pretty slow, right? One of the many reasons why that happened are outdated crappy kernels and bad settings. I looked at an OMV image for A20 based boards like Banana Pi Pro recently, ended up here http://simplenas.com/download/bananas-simple-system and simply thought: Are you kidding me? Your stable release is based on horribly outdated Debian Wheezy, a kernel 3.4.something and a 2.x OMV release?! What a mess... Igor had an OMV install routine since a long time in his Debian-micro-home-server post-processing script but since an Armbian user evaluated using Armbian's build system to generate OMV images we thought... why not integrating it in Armbian? We started yesterday to prepare all kernels for the Ethernet equipped boards to meet the OMV requirements (just Quota/ACL/NFS stuff) and then played around with an installation routine that can be used from our image customization script to generate OMV images without the need to tamper with them manually later (one of Armbian's design goals). A first preview is now available. All that's needed to create a fresh OMV image from scratch is checking out Armbian build system as outlined in the docs and then use the new customize-image.sh.template as userpatches/customize-image.sh, uncomment the InstallOpenMediaVault line and then create an Debian Jessie Armbian image that will be a full OMV 3 release afterwards. This will take some time but then you have an OS image that can be booted on your device of choice, needs 2-3 minutes for housekeeping (finishing the installation -- network connection required!), reboots and will then be accessible through http://openmediavault.local (if your router's DHCP server is not broken then http://openmediavault will also work). No need to use any smelly OMV images you find in dark corners of the Internet. Just create one yourself if needed or ping your hardware vendor of choice to look into the possibilities he gets when relying on a solid foundation for creating OS images. Quick performance test with the beefiest device we currently support (Solid-Run's Clearfog Pro): 86/98 MB/s (write/read) over a wired connection isn't that bad but with other/slower devices we've seen that there's some room for improvement. I will join openmediavault forum soon to discuss stuff there. Especially that Armbian's cpufreq scaling settings get overwritten (/etc/defaults/cpufrequtils) is a bad thing since the settings OMV uses are ok for x86 boxes but not for those small SBC. All quick tests looked good. I tried hard to destroy configurations on OrangePi Plus 2E, Banana Pi Pro, Clearfog Pro and Pine64+... but to no avail. Even nasty things like creating btrfs stripes on the command line (USB3+SATA with kernel 4.10) and then importing into OMV worked flawlessly, Armbian's nand-sata-install mechanism executed while OMV running to transfer the installation to a SATA disk worked and even downgrading the above btrfs stripe to 4.4.59 and moving the USB3 connected disk out of its enclosure and behind an ASM1062 PCIe SATA controller worked flawlessly (though performance sucks for whatever reasons). I'm pretty impressed by OMV No need to fiddle around with the command line, everything can be configured through a browser... nice. This is how the interface looks like: 2 Link to comment Share on other sites More sharing options...
tkaiser Posted April 4, 2017 Author Share Posted April 4, 2017 I'll now use this thread here to take some notes where to look into for further improvements since I'll stop working on this for a few weeks (waiting for a 'NEO Plus 2' dev sample from FriendlyELEC): The installation routine I wrote suffers from a bug. When running inside the chroot of our build system Postfix configuration fails due to: 'newaliases: fatal: inet_addr_local[getifaddrs]: getifaddrs: Address family not supported by protocol' (see Ubuntu bug 1531299). This will lead to Postfix and OMV unconfigured with has then to happen at firstrun which is bad. We need a fix or workaround found a workaround On 'official' OMV images (I checked the one for RPi 2/3) all OMV Extras are ready to be installed. With our build it takes some time to download them after booting first. We need a fix. Also 'Jessie backports' repo is missing. We need to investigate how we can prevent OMV from overwriting /etc/default/cpufrequtils since the settings Armbian uses are the result of intensive testing and both set on a 'per device' and 'per kernel base' (eg. we use 'interactive' with Allwinner legacy kernels but 'schedutil' or 'ondemand' with mainline kernel). As soon as one touches powermanagement settings in OMV this gets overwritten with 'conservative' which is a good choice on x86/x64 but not with most kernels we have to use on our ARM boards. Performance drops notably while there's no consumption savings Disks can be connected/disconnected to a NAS dynamically (think of USB disks/sticks) so it's worth a look to identify the device type (HDD or flash media based on /sys/block/*/queue/rotational) and then set /sys/block/*/queue/scheduler accordingly: We can use noop, deadline or cfq here and it needs some testing how they correlate with other NAS settings on weak ARM devices Memory constraints on boards with low memory especially using 64-bit CPU cores (thinking of NEO 2 and NEO Plus 2 here at the moment). It has to be evaluated whether these boards perform better with a 32-bit userland or not since a Gigabit equipped NAS with slow storage (again NEO 2) greatly benefit from RAM to be used as buffers when it's about writing to the NAS (for reads it makes almost no difference). See also https://github.com/igorpecovnik/lib/issues/645 Filesystems and transparent file compression: using btrfs with LZO compression is a great way to improve performance on NAS devices that have plenty of network bandwidth but are bottlenecked by storage (again: NEO 2: with mainline kernel though Etherner driver still being WiP we can already saturate Gigabit Ethernet but USB storage bottlenecks). Note: this will only help with compressable data so for the typical media center only storing data in already compressed file formats that won't improve that much. Haven't looked into how to tell OMV to use compression yet. Partitioning: Currently official OMV image for RPi contains a boot partition (not needed on non Raspberries) and a root partition of fixed size. The remaining space on SD card will then be resized to one partition useable for shares. We should implement the same. Limiting rootfs resize is easy since mechanism is already implemented: 'echo 6905856s >/root/.rootfs_resize' prior to first boot and then the remaining space can be partitioned later or automagically (when detecting kernel 4.4 or above as btrfs for example). We support at least one board with faster SD card implementation than USB (ASUS Tinkerboard) and it could also make some sense to use a large SD card for storage also (energy efficient and when storage is set up correctly connected disks can spin down pretty often) rootfs will be resized to only 4GB, what's missing is a routine that partitions the unused space. Later Making IO more snappy: the Linux kernel implements different I/O scheduling classes and priorities. The defaults are made for systems with many (even hundreds) concurrent users. The NAS use case we're talking here about is usually quite the opposite since most of the time only one single user will access the shares. Therefore letting NAS server tasks run with realtime instead of 'best-efforts' scheduling class makes a huge difference on small ARM boards while it doesn't hurt. A background daemon using 'ionice' can make a great difference (or even starting whole subsystems like Samba or Netatalk already with realtime scheduling class through ionice) Now some numbers: many people believe a Raspberry 2 or 3 would make up for a great small OMV NAS. Couldn't be more wrong since all Raspberries suffer horribly from lowest IO bandwidth seen on all SBC. The CPU has only a single USB2 connection to the outside all 4 USB ports and the Fast Ethernet implementation have to share. This sounds as bad as performance numbers look -- see below. You can't even fix this by using a Gigabit Ethernet dongle since 'shared USB2' is the bottleneck here and it gets even worse due to USB bus collisions Test setup is a Raspberry Pi 3 with updated OMV to most recent 3.0.69 version and Pine64+ with freshly 3.0.69 built from scratch (GbE equipped and with Armbian default settings). Test disk is a 'SAMSUNG MZ7TE128HMGR-00004' in a JMS567 enclosure (externally powered to prevent underpowering hassles -- both Raspberries and Pine64+ unfortunately use the horrible/crappy Micro USB connector responsible for all sorts of problems including shitty performance!). Network is Gigabit Ethernet capable and known to allow +940 Mbits/sec. Pine64+ with 1 GbE settings (using a 300 MB testfile which fits perfectly in RAM -- this benchmark is not remotely testing 'disk' but only CPU and DRAM, so it's both a great example for 'benchmarking gone wrong' but also a nice way to test for some bottlenecks and use cases individually. Since the more RAM a NAS has this situation with data buffered in DRAM while waiting to be written on slow storage will happen more frequently --> overall write performance improves) So now testing Pine64+ with 10 GbE settings (using a 3 GB testfile which has to be written to disk since it doesn't fit into 1GB RAM any more: we're now seeing IO bottlenecking the test and also that even if that's just USB2 we get some extra performance due to 'USB Attached SCSI' possible with those cheap Allwinner boards!) Now testing with RPi 3 with Fast Ethernet and official OMV image (updated to latest versions including kernel): All numbers much worse but also as expected: Now trying to help RPi 3 with Gigabit Ethernet: one of those great and also cheap RTL8153 Ethernet dongles. But performance still sucks as expected since the single USB2 port is the real bottleneck here (forget about 480Mbits/sec, in reality it's a lot less and collisions on the bus further decrease real-world throughput). And while read performance looks great this is just another 'benchmarking gone wrong' symptom since in this case data is read from memory and no disk access is involved. If I would repeat the test with 10 GbE settings then read performance would also drop drastically since then USB disk and USB network access would start to interfere and destroy performance totally. It's obvious that this make no sense at all since Raspberries suffer from IO/network bandwidth being too low for an ARM NAS. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted April 4, 2017 Author Share Posted April 4, 2017 And another small update on Raspberry Pi 3 and OMV defaults. I noticed that on the Raspberry OMV also 'conservative' cpufreq governor was used which leads to the kernel almost all the time sitting at the lowest clockspeed (600 MHz). So I simply gave 'performance' a try which leads to some significant improvements (not regarding bandwidth since Fast Ethernet is here the bottleneck but storage latency greatly improved): So on RPi maybe best cpufreq governor settings would be ondemand with the following tweaks: echo 1 >/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy echo 25000 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate echo 2 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor Please remember that when running any OS on RPi 3 the kernel doesn't know the whole truth. If you need exact information about the clockspeeds your ARM cores are running with you would need the vcgencmd command since this is the only way to get information from the 'firmware' (like actual CPU clockspeed as soon as throttling happens). So something like this when doing any test is highly recommended: while true; do vcgencmd measure_temp && vcgencmd measure_clock arm; sleep 3; done A more verbose variant of this simple script that also reports problem N°1 on Raspberry Pis (under-voltage) can be found here: https://github.com/bamarni/pi64/issues/4#issuecomment-292707581 Link to comment Share on other sites More sharing options...
tkaiser Posted April 12, 2017 Author Share Posted April 12, 2017 In the meantime I tested through the following boards with our 'Armbian went OMV' image customization routine: OrangePi Plus 2E (H3, Gbit Ethernet) Pine64+ (A64, 64-bit, Gbit Ethernet) Banana Pi/Pro (A20, Gbit Ethernet, lame SATA) Clearfog Pro (Armada 38x, fast Gbit Ethernet, insanely fast SATA), two different kernels that perform totally differently NanoPi NEO (H3, Fast Ethernet) OPi Plus 2E shows a little bit lower scores with same settings compared to A64 (so we can assume H5 based boards will also perform slightly better than H3 ones). With Fast Ethernet equipped NanoPi NEO there's something seriously wrong (settings, driver, whatever -- at least with current Armbian settings we use for the nightlies performance is a bit lower compared to Raspberries so settings/driver need some attention) Anyway, wanted to test through a potential worst case scenario: Setting up a RAID on host A that does not work after A died on host B, C or D. I used OMV GUI with two UAS attached SSDs to create a mdraid stripe-set on an Orange Pi Plus 2E. So this is 'USB RAID' anyone laughs about. 40/60 MB/s write/read isn't great but that's just 'you get what you paid for' Now trying to re-use this RAID set on another machine. I freshly created an OMV build for my Banana Pi and attached one of the 2 USB disks directly to Banana's SATA port (the other SSD was still USB/UAS attached). As expected 'native SATA' lost since Allwinner's SATA implementation is horribly slow and specs don't matter that much. The challenge was to check whether an USB-only created RAID on platform A could be imported on platform B with different host-disk connections --> easy but performance starts to suck even more, old and slow dual-core A20 is not capable to deal with this) Can it get more worse? Of course. We've seen NAS performance of 40/60 MB/s with H3 and USB2, slowing down to 36/36 MB/s with 'native SATA' so let's have a look what happens with SBC-RAID relying on SATA only. I stopped the board, added the crappy JMB321 SATA port multiplier I abandoned almost 2 years ago due to being just a pile of crap, put both SATA disks behind the PM, adjusted boot script to cope with SATA port multipliers, and... now with 'native SATA' combined with shitty port multipliers we're able to decrease performance even more (we're at 26/33 MB/s already): 1 Link to comment Share on other sites More sharing options...
tkaiser Posted April 12, 2017 Author Share Posted April 12, 2017 For testing purposes I continued with the above '2 SSD mdraid-stripeset': Now the same RAID set is imported in a freshly created OMV image for ODROID-XU4. All the people focussing on 'judjing by specs only' will be pretty confident that here performance must be way higher since 'USB3'. Well, let's have a look. Armbian supports two kernel flavours: Hardkernel's 3.10 and 4.9 branch (the latter more or less an 4.9LTS with board specific fixes). Let's have a look at 3.10.105 first: And now '10 GbE' settings (increasing testfile size and also blocksize): If we compare with USB2 RAID on Orange Pi Plus 2E above it seems 'Hi-Speed' (USB2) vs. 'Full-Speed' (USB3) doesn't make that much of a difference. But on the cheap Allwinner board we were able to make use of UASP so let's try it out here too: Again with '10 GbE' settings also: So this looks pretty nice for such a board and when testing with Windows Explorer or macOS' Finder sequential transfer speeds are much higher (exceeding 100MB/s -- explanation here http://www.helios.de/web/EN/support/TI/157.html) Funnily when using 4.9 kernel on ODROID-XU4 we run into another problem: the SoC implements big.LITTLE and combines 4 fast A15 ARM cores (cpus 4-7) with 4 slow ones (cpus 0-3 are the A7 cores that are also clocked lower). With Hardkernel's 3.10 kernel so called heterogeneous multi-processing (HMP or 'global task scheduling') works correctly while it does not with mainline (yet). So we have to help the kernel manually assigning different tasks to different CPUs. My approach for now is to let all interrupt processing happen on the little cores (network IRQs on cpu 0, 1, 2 and Gigabit Ethernet on cpu 3) and limit the NAS relevant daemons to the big cores 4-7. I tested through various situations and the stuff running on the little cores never was bottlenecked by CPU so IMO it's fine to let this stuff happen on cpus 0-3. We end up with these tweaks executed on startup on every Armbian installation for XU4 from now on (confirmed to work on both kernels): https://github.com/igorpecovnik/lib/blob/ebd555081e5a9001bd40571bc77e1e8f5baa7ea8/scripts/armhwinfo#L82-L92 When Ambian is used to create an OMV image then this stuff will also be added to /etc/rc.local automagically (confirmed to work on both kernels): echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor sleep 0.1 cd /sys/devices/system/cpu for i in cpufreq/ondemand cpu0/cpufreq/ondemand cpu4/cpufreq/ondemand ; do if [ -d $i ]; then echo 1 >${i}/io_is_busy echo 25 >${i}/up_threshold echo 10 >${i}/sampling_down_factor fi done And then to help OMV performing better we tweak IO scheduling stuff and move the NAS daemons to the big cores (the latter only on ODROID-XU4): https://github.com/igorpecovnik/lib/blob/b13e92911e91e34b0b9189c704f3186a0b3788f0/scripts/customize-image.sh.template#L134-L137 I decided against cgroups or modifying daemon startup scripts since I don't want to mess with OMV internals and avoid situations where later OMV updates break performance optimizations. That's the only reason this is done as a cron job executed every minute. Works quite well in real-world situations and when doing benchmarks one has to wait at least 60 seconds after connecting a share to the test machine for the cron job doing its magic Link to comment Share on other sites More sharing options...
tkaiser Posted April 13, 2017 Author Share Posted April 13, 2017 In the meantime I added a lot of tweaks to Armbian's OMV installation routine and started to create OMV images from scratch for the more interesting boards: http://kaiser-edv.de/tmp/NumpU8/ (based on most recent OMV 3.0.70 and Armbian 5.27 releases) I stay with legacy kernel where appropriate (that's currently Clearfogs, ODROID-C1/C2 and A64) and skip both H3 and H5 boards for now since too much WiP at the moment. Some of the tweaks lead to much better performance numbers even if we can't use 'USB Attached SCSI' now with Pine64 for example: Sequential write performance looks way better compared to the above Pine64 numbers in post #2 (made with mainline kernel, UAS but also cpufreq limited to just 864 MHz back then and no IO scheduler tweaks). The nice thing with these OMV images now is that they use already the following repos: Official Debian Jessie repos (here you get updates and security fixes for normal packages) OMV (OMV and OMV Extras updates) Armbian (kernel/bootloader updates and also potential performance tweaks) In the screenshot it's obvious that read performance dropped down to just 32MB/s in one run (check the orange triangles). Am currently investigating this and if it's fixable with better settings that are applied through Armbian's prepare_board mechanism at startup then the fix will magically find all installations in the future since it will be applied through the usual 'apt upgrade'. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted April 15, 2017 Author Share Posted April 15, 2017 Another take on the 'cheap ARM thingie running OMV' approach: So what about taking 3 things for $7 each and combine them to a compact DIY NAS? Orange Pi Zero (with 256MB DRAM) NAS Expansion board An external USB3 Gigabit Ethernet dongle The above setup uses the 'Orange Pi Zero Plus 2' that has not even Ethernet (only el cheapo AP6212 Wi-Fi) and I used it just since I'm working on this board currently anyway (still waiting for Xunlong to release an Gigabit Ethernet equipped Zero variant just like NanoPi NEO 2 -- then it get's interesting). With the above setup you can add a mSATA SSD (maybe not that smart behind an USB2 connection and that's the reason I used a passive mSATA to SATA converter to connect an external 3.5" HDD, they're as cheap as $1.5). The other alternative is to attach a 2.5" HDD/SDD (maybe using Xunlong's SATA+power cable) but then it's mandatory to power the whole setup through the 4.1/1.7mm barrel plug that can be seen on the right. Anyway back to the software side of things: In Armbian's special OMV installation variant we support the above setup so even boards without Ethernet at all can simply be used together with one of the two USB3 Gigabit Ethernet dongles available (there's only ASIX AX88179 or RealTek RTL8153). We activate both drivers in /etc/modules and deactivate those not attached on first boot. So you could also run this OMV installation variant on H3 boards lacking Ethernet or with only Fast Ethernet and get some extra speed by attaching such an USB3 GbE dongle on first boot and from then on just if needed (for example you could use an Orange Pi PC this way for hourly incremental networked backups with Fast Ethernet where 'NAS performance' is only needed when it's about large restores or 'desaster recovery' and then you attach the GbE dongle and performance is 2-3 times higher). Unfortunately I ran into 2 problems 1 problem when trying out the only reasonable variant: OMV running with mainline kernel. First for whatever reasons it seems Armbian's build system currently is somewhat broken regarding H3 boards (should also affect nightly builds). It seems cpufreq/DVFS doesn't work and currently H3 boards with recent mainline builds are slow as hell. And 2nd problem with the NAS Expansion board is that disks attached to any of the two JMS578 USB-to-SATA bridges don't make use of UAS (needs investigation). Since I'm currently limited to USB's Mass Storage protocol anyway I used legacy kernel to create an OMV image. Performance not really stellar but ok-ish for a board lacking Ethernet at all: I tried also one of my (older) JMS567 USB-to-SATA bridges (main difference: no TRIM support unlike JMS578 on the NAS Expansion board) and numbers look slightly better (needs also some investigation): I won't publish OS images for H3/H5 boards now using legacy kernel since patience is the better variant. When we're able to use mainline kernel on the boards performance will get a slight boost and hopefully we can then also make use of UAS on the NAS Expansion board. Why H3/H5? Since up to 4 independant real USB host ports that do not have to share bandwidth (compare with RPi 3 above -- no way to exceed poor Fast Ethernet performance there with a similar looking setup since RPi's USB receptacles have to share bandwidth) Interesting detail: The two USB type A receptacles have higher priority than the 2 JMS578 chips on the board. If an OPi Zero is connected to the NAS board as soon as you connect an USB peripheral to one of the two USB ports the respective JMS578 disappears from the bus (left receptacle interacts with mSATA, the right one with the normal SATA connector). 1 Link to comment Share on other sites More sharing options...
tkaiser Posted April 15, 2017 Author Share Posted April 15, 2017 And last update for some time: one of the best ARM boards currently available for NAS purposes: Solid-Run's Clearfog Pro based on Marvell ARMADA 38x: This board has 3 independent GbE channels (one WAN port the red Ethernet cable is connected to, the SFP cage next to it supports both GbE and 2.5GbE and the 6 RJ45 ports on the left are behind an integrated switch IC). There's a M.2 slot on the back (where the Intel SSD is connected to using a passive M.2-to-SATA adapter) and 2 mPCIe slots that can be converted to mSATA if desired: https://docs.armbian.com/Hardware_Marvell/ That means you can have 1 to 9 real SATA ports with this board: the M.2 port on the PCB back can take M.2 SSDs or a passive SATA adapter you can use one ASM1062 mPCIe card per slot to get 2 SATA ports each instead of the cheap ASM stuff you can use Marvell 88SE9215 based mPCIe cards that add 4 SATA ports per mPCIe slot and support performant FIS based switching SATA port multipliers (but here it gets stupid since you don't want to put more than 4 SATA disks behind a PCIe 2.x lane!) or you patch u-boot to convert one or two of the mPCIe slots to (m)SATA (see the aforementioned link to documentation and if you want SATA instead of mSATA it's just a passive adapter as can be seen above in the right mPCIe/mSATA slot) In the above chaos that's 4 SATA ports (2 of them currently in use with an 2.5" Intel SSD and a 3.5" Seagate HDD powered by a dual-voltage PSU that also powers the board). OMV happily deals with all connected disks but we faced some problems trying to optimize performance. The good news: since we were in 'active benchmarking' mode in the meantime performance increased a lot and we also got DFS/cpufreq scaling to work (see this thread for performance numbers and progress eliminating the problems) In the above picture OMV is running off the miniature SanDisk USB3 thumb drive inserted into the ORICO GbE/hub combination (there's also a fourth RTL8153 based GbE NIC inside behind the VIA812 USB3 hub) but tests with OMV's flashmemory plugin (relying on 'folder2ram' functionality) also confirmed that's ok to use quality SD cards for the OS (allowing HDDs to spin-down when not in use!) Currently we provide OMV images for Clearfog Base/Pro with Marvell LTS kernel (4.4) or mainline kernel (4.10): http://kaiser-edv.de/tmp/NumpU8/ Both are not perfect at the moment but the good news is that improvements (like working DFS/cpufreq scaling that @zador.blood.stainedported to Armbian's kernel today) will be provided through apt.armbian.com later so they're just an 'apt upgrade' away few weeks after testing. Link to comment Share on other sites More sharing options...
tkaiser Posted April 15, 2017 Author Share Posted April 15, 2017 And another update... In case anyone is wondering why there aren't OMV images for Cubox-i, Hummingboard or UDOO available (all i.MX6 based with 'real SATA' and Gigabit Ethernet): it's just not worth the efforts since unfortunately on these devices Gigabit Ethernet is limited to ~405/580 Mbits/sec so 'real SATA' is of no use if it's about NAS/OMV: For some storage performance numbers please refer to the following: But in a NAS scenario Ethernet is that limited that overall performance is not that much better compared to el cheapo USB2 equipped boards like NanoPi NEO 2 or Orange Pi PC 2 (both Allwinner H5 based and below the $20 barrier). In case you own such an i.MX6 board already and want to enjoy OMV on it... it's really just checking out Armbian's build system and let it create an OMV image with top performance from scratch: https://docs.armbian.com/Developer-Guide_Build-Preparation/ All you need then is to execute this sed 's/\t\t# Ins/\t\tIns/' lib/scripts/customize-image.sh.template >userpatches/customize-image.sh prior to compile.sh invocation in the next step: ./compile.sh KERNEL_ONLY=no RELEASE=jessie PROGRESS_DISPLAY=plain Link to comment Share on other sites More sharing options...
tkaiser Posted April 22, 2017 Author Share Posted April 22, 2017 And another update wrt 'active benchmarking': Not collecting numbers without meaning but looking at benchmark numbers with the only aim to try to understand what's happening and to improve these numbers. In the meantime I gave OPi Zero with its NAS Expansion board and an el cheapo RTL8153 GbE dongle another try. As reference the numbers from a few days ago: 21.6/26 MB/s (write/read) vs. 33/34 MB/s now. That's a nice improvement of +11 MB/s in write direction and still 8MB/s more in read direction. Before: After: What's different? The hardware is not (well, almost not). We're running with mainline kernel now that means way better driver quality compared to the smelly Android 3.4.x kernel the legacy H3 images have to use This commit increased CPU clockspeed from 1GHz with mainline currently to 1.2GHz ( ~1MB/s more NAS throughput in each direction) This commit resolved an IRQ collision problem (network and storage IRQs both processed by cpu0 which caused a CPU bottleneck, this greatly improved write throughput) And the simple trick to use an USB enclosure with JSM567 instead of the JMS578 on the NAS Expansion board is responsible also for a few MB/s more since now we're using UASP (improved read performance) Regarding the latter I also consider this a software tweak since now for whatever reasons JMS578 with mainline kernel makes only use of slower 'Bulk-Only' instead of 'USB Attached SCSI'. I'm confident that this can be resolved if interested users get active and discuss this at linux-usb. Some 'storage only' performance numbers can be found here as comparison. Edit: 'lsusb -v' output for both JMS567 and JMS578 with mainline kernel: ### JMS567 / UAS Bus 003 Device 002: ID 152d:3562 JMicron Technology Corp. / JMicron USA Technology Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x152d JMicron Technology Corp. / JMicron USA Technology Corp. idProduct 0x3562 bcdDevice 63.02 iManufacturer 1 ADMKIV iProduct 2 AD TO BE II iSerial 3 DB123456789699 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 85 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xc0 Self Powered MaxPower 30mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 1 bNumEndpoints 4 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 98 iInterface 10 MSC USB Attached SCSI Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Command pipe (0x01) Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Status pipe (0x02) Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Data-in pipe (0x03) Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x04 EP 4 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Data-out pipe (0x04) Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 22 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000f0e Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000e Device can operate at Full Speed (12Mbps) Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 1 Lowest fully-functional device speed is Full Speed (12Mbps) bU1DevExitLat 10 micro seconds bU2DevExitLat 32 micro seconds Device Status: 0x0001 Self Powered ### JMS578 / Bulk-Only Bus 003 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x152d JMicron Technology Corp. / JMicron USA Technology Corp. idProduct 0x0578 bcdDevice 4.04 iManufacturer 1 JMicron iProduct 2 USB to ATA/ATAPI Bridge iSerial 3 0123456789ABCDEF bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 32 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 22 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000f0e Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000e Device can operate at Full Speed (12Mbps) Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 1 Lowest fully-functional device speed is Full Speed (12Mbps) bU1DevExitLat 10 micro seconds bU2DevExitLat 32 micro seconds Device Status: 0x0000 (Bus Powered) 1 Link to comment Share on other sites More sharing options...
tkaiser Posted May 23, 2017 Author Share Posted May 23, 2017 EDIT: Hardkernel explains various issues with Cloudshell 1 and 2. And another one: USB reset issues with Cloudshell 1 explained. I thought about opening a new thread but since this is just a list of known issues most probably OMV users will run into I'll add it to this thread. As follows the list of problems with ODROID XU4 when used as NAS especially when combined with Hardkernel's Cloudshell 2 gimmick: 1) Everything is USB here. Gigabit Ethernet on XU3 is provided by an internal ASIX AX88179 USB chip while Hardkernel replaced this with the somewhat better RTL8153 on ODROID XU4. This is not necessarily bad especially since Gigabit Ethernet bandwidth can be fully satured but amount of interrupts to be processed as well as latency are higher than necessasry. Also it seems that general USB problems on other ports (cable somewhat loose) affect network stability (see this report here for both available kernels) 2) The other USB3 port of the Samsung Exynos 5422 SoC is connected to an Genesys Logic GL3521 USB hub on the PCB showing up in lsusb output as 05e3:0616 (XHCI/USB3) and 05e3:0610 (EHCI/USB2). This is not necessarily bad as long as you simply ignore one of the two USB3 receptacles and connect 1 disk only. Performance when accessing two disks in parallel sucks (as it's always the case when hubs are involved). These are two Samsung SSDs in two UAS (USB Attached SCSI) capable enclosures running as RAID-1: /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M |__ Port 1: Dev 5, If 0, Class=Mass Storage, Driver=uas, 5000M |__ Port 2: Dev 6, If 0, Class=Mass Storage, Driver=uas, 5000M random random kB reclen write rewrite read reread read write 102400 4 14197 15286 15750 16128 14502 14419 102400 16 38999 39800 48671 51743 45884 39061 102400 512 50223 51847 111445 160264 157900 60487 102400 1024 50706 48524 141865 162106 160770 50088 102400 16384 48546 50328 218430 221731 221526 47258 Performance is worse than a single SSD being accessed (as expected -- you always try to avoid USB hubs between disks and host) and also problems are reported by users, for example search for 'USB HDD connected as USB2 device' here: https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=50#p187838 (for whatever reasons only the EHCI part of the internal GL3521 hub is coming up cutting the SuperSpeed data lines to the ASM1153E and so the user in question ends up with an USB2.0 data connection (Hi-Speed and not SuperSpeed). As usual UAS will be blamed for this problem but is totally unrelated. 3) Adding to this already somewhat fragile situation the USB3 receptacles are known to cause trouble. Please see here and here and problem 1) I reported here yesterday: My problem could be solved by exchanging the USB cable/plug: the one I used later and with another disk enclosure before fits more tightly. Warning: If you ever connect something to any of the USB3 ports of an ODROID XU4 ensure that the USB plug and receptacle fits 100 percent (some force should be needed to insert the plug) otherwise you'll run for sure into troubles. Symptoms may vary, in my case the disk did not even mount but it's also possible that this works and you run later into troubles when accessing the storage. In case you use an UAS capable disk enclosure be aware that dmesg will show messages mentioning 'uas_eh_abort_handler' since this driver is dealing with the mess. Again this is totally unrelated to UAS, it's just this driver reporting hardware troubles now. 4) Underpowering issues: see problem 2) reported here yesterday: https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=150#p190909 (or here). After I ensured that USB cable and receptacle fit tightly I ran a benchmark and after some time problems occured ('uas_eh_abort_handler' showing up in dmesg output). UAS is still not the cause for this problem but UAS might be related since with UAS communication between host and disk gets way more efficient and as such power requirements increase too. So with this XU4 powered 2.5" SSD UAS triggers the underpowering issue while the same SSD might run fine without UAS (Mass Storage / Bulk Only) or on an USB2 bus (less efficient protocol --> lower power requirements). I use these JMS567 based 2.5" disk enclosures since years with a variety of SSDs and 2.5" HDDs on powered by USB3 ports of various Macs and never encountered problems (except of an EVO750 with rather high peak consumption when being powered on). To confirm I used the same EVO840 in the same JMS567 enclosure powered by my MacBook Pro only (sytem_profiler reporting 'Current Available (mA): 900') and ran a quick but demanding benchmark there too: I don't know whether that's just the combination of my XU4 with this enclosure (a different ASM1153 based worked fine powered by XU4 only) or a more general problem. The reason why I list this is since it looks like 'yet another UAS problem' but it's only partially related due to higher efficiency triggering potential underpowering problems earlier. This very well known problem has been described by one of Linux' USB gurus here already and gets happily ignored since then: https://github.com/hardkernel/linux/commit/c18781b1ef56a800572e8342488504e4e818013a#commitcomment-21996557 5) potential USB3 host controller issues as outlined by Hans de Goede in the aforementioned link maybe needing a two line patch to add another quirk. Status: no one cares about. Can it get worse? By adding a Cloudshell 2 you might avoid problem 4) above but might still suffer from 2) and 3) and add a bunch of new problems: 6) On the first batch of Cloudshell 2 units Hardkernel simply forgot to flash a serial number. Problem reported here, fix there, workaround added in latest OMV image so please ensure that you're using OMV 3.0.76 image or above if you use a Cloudshell 2. 7) Using a Cloudshell involves connecting an USB cable between one of XU4's USB3 ports and the Cloudshell PCB. So you can run into issue 3) -- USB3 receptacle crappiness -- above at any time (now or later, just adding some vibrations might break your whole storage setup if USB plug fits not absolutely perfect -- some force should be needed to insert the plug). Here a user reports 'UAS issues': https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=150#p191018 that are obviously none but will just be reported differently after disabling UAS https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=150#p191082 (of course you can't fix USB connection or cable issues with different drivers and also of course we can't blame users for reporting random erratic behaviour as 'UAS issue' since they don't know better, shouldn't be experts and are even victims of an insanely stupid 'UAS is broken' campaign still running over at ODROID forums). 8) Cloudshell 2 uses a JMicron JMS561 USB RAID chip which is said to rely on JMS567 for the USB-to-SATA bridge functionality. JMS567 needs specific quirks to work flawlessly with UAS, (see here for an example how this is diagnosed/developed and how to proceed with a new device), JMS561 has a different USB product ID so the quirk will not be applied (but is maybe needed). Status: no one cares about. 9) The JMS561 on Cloudshell 2 can work in various modes: two stupid RAID variants, Span (adding capacity of two disks) and JBOD ('just a bunch of disks' so 2 disks can be accessed individually -- no idea how that's achieved in this mode, maybe someone can post 'lsusb -t' output with 2 disks inside the Cloudshell 2 and JBOD mode in this mode two disks appear as one USB device node but as different devices since JMS561 acts as a SATA port multiplier here). Depending on what you choose you might get different or no SMART data from connected drives. JMS561 is also missing from smartmontool's drivedb.h (so you get just an 'Unknown USB bridge [0x152d:0x0561 (0x8037)] if you're not forcing smartctl to make use of SCSI / ATA Translation. Status: no one cares about. 10) The Cloudshell 2 LCD is currently used to display some fancy data instead of useful information (highest peak temperature since last reboot, CRC errors occured, LCC increasing too fast, firmware updates available for attached disks or SSDs, any signs of connection errors according to dmesg and so on). The software support is provided by a Hardkernel employee (using his private Ubuntu PPA). First version showed efficiency problems (also increasing idle temperatures/consumption) but this should be fixed in the meantime so please ensure to either run 'apt upgrade' on the command line or use OMV's 'Update Management' menu item. 11) Cloudshell 2 in most modes (RAID-0, RAID-1 and Span) currently doesn't provide vital SMART health data for drives. They're only available when using PM/JBOD mode (PM means port multiplier -- since JMS561 product brief doesn't mention switching method I would assume that JMS561 only implements the slow CBS mode and not FIS based switching). Details regarding SMART (let's hope Hardkernel gets in touch with JMicron and can provide a fix for this but this would require flashing a new firmware to JMS561 anyway): http://forum.openmediavault.org/index.php/Thread/17855-Building-OMV-automatically-for-a-bunch-of-different-ARM-dev-boards/?postID=144752#post144752 12) Even in PM/JBOD mode SMART is broken and needs a Cloudshell 2 firmware upgrade (the Cloudshell 2 gimmick fails to do 'SCSI / ATA Translation' correctly which leads to USB resets) 13) It seems no quality control happens. This is both somewhat surprising and inexcusable. We're talking here about a combination of SBC and an add-on sold and produced by the same vendor that is advertised as a complete NAS solution. The cable connecting both pieces and being part of the add-on product is one of the most crucial parts since this NAS kit doesn't rely on reliable storage implementations ('reliable' includes design of connectors, protocol implementation and monitoring possibilities) but USB instead. Real storage protocols implement some sort of error detection mechanism (CRC -- cyclyc redundancy check) and provide occuring mismatches as counter (with SATA it's SMART attribute 199). But this CRC counter only reports data corruption between SATA host controller and drives, in Cloudshell's case that's the JMS561 on Cloudshell PCB connected directly to the drives without cables in between. Data corruption due to defective USB cable or USB plugs not fitting perfectly into the receptacle are not handled at this low layer. They're reported at a higher layer and if you see them your filesystem is most probably already corrupted (check dmesg output for either 'I/O error' or 'uas_eh_abort_handler' since the messages vary depending on whether UAS or the older Mass Storage protocol is used) BTW: These are SATA connectors and a 'Mini SAS to SATA breakout cable'. Guess why there are metal latches on each end of the cable: 1 Link to comment Share on other sites More sharing options...
tkaiser Posted May 24, 2017 Author Share Posted May 24, 2017 Update: wasted another half an hour diagnosing problems with a device I'll never use again: in my XU4 the lower USB3 receptacle seems to be the problematic one. I used btrfs for obvious reasons (checksumming reports data corruption pretty early -- see my 'crappy JMicron SATA port multiplier adventure' from years ago) and let an iozone bench run. On the lower receptacle 1 out of 4 cables failed, on the upper none. Only after starting to slowly move XU4 I could trigger 'uas_eh_abort_handler' and friends occuring in dmesg output (still not reporting 'UAS issues' since not related to UAS at all but just uas driver having to deal with USB receptacle crappiness). May I introduce you to my 4 different USB3 SuperSpeed cables: The little guy on the left is called 'Evil UAS' in ODROID XU4 world (but I've to admit that "Shooting the messenger" is always more fun than trying to solve the real problems ) Edit: Meanwhile in ODROID XU4 world some already start to realize that uas driver reporting contact/cabling/underpowering problems doesn't mean UAS is at fault (since when blacklisting UAS nothing gets better but only dmesg output changes --> different drivers --> different messages). But it might still take some time until affected users start to focus on the appropriate side of the USB cable (the type A plug having to fit extremely tight into XU4's USB3 receptacles -- if you don't need some force inserting an USB plug simply forget about reliable data transmissions) and maybe in a few months or a year ODROID micro community will accept that UAS is not bad, evil or broken in general but just new to them. Link to comment Share on other sites More sharing options...
manuti Posted June 5, 2017 Share Posted June 5, 2017 So many thanks for your images, now running OMV_3_0_71_Bananapipro_4.10.12 after a weekend of restoring and resetting my first armbian deploy ... many time ago. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted June 5, 2017 Author Share Posted June 5, 2017 1 hour ago, manuti said: So many thanks for your images, now running OMV_3_0_71_Bananapipro_4.10.12 after a weekend of restoring and resetting my first armbian deploy ... many time ago. Somewhat off-topic here but a great way to say thank you is to support Armbian in practical ways, eg. seeding Torrents: This is smallest OPi Zero seeding 24/7 our torrents since few weeks (33% of the 256MB used, consumption below 700mW running off a 128 GB Samsung EVO) 2 Link to comment Share on other sites More sharing options...
tkaiser Posted June 26, 2017 Author Share Posted June 26, 2017 Another update wrt upcoming ROCK64 board. It's still not decided whether Armbian will support this board anytime soon or at all but NAS results already look promising. USB3 performance is excellent (outperforming ODROID-XU4 easily) but currently we have discovered a few network issues while testing/evaluating the board (most probably we need to tweak so called TX/RX GbE delay settings). But even without this tweak it already looks as follows: These are the 'Enterprise network settings' (test size is 3GB and transmission block size 1MB): These are 'Gigabit Ethernet' settings with just 300 MB filesize and 128KB blocksize: It's pretty obvious that there's something wrong with network settings when we look at the sequential transfer speeds. Just compare with Pine64 above where USB2 storage is the bottleneck: there the test with 300 MB filesize runs entirely in RAM and Pine64 scores 66/78 MB/s write/read, with the Enterprise settings speeds drop down to storage performance: ~38MB/s in both directions. With ROCK64 all tests run entirely in RAM (I test with the 4GB model) and there's a significant drop in transfer speeds if we compare 1MB with 128KB blocksize. My settings (TCP/IP offloading deactivated, eth0 IRQs on cpu3, USB3 IRQs on cpu2, performance governor, Receive Packet Steering enabled): root@rock64:~# tail /etc/rc.local echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor for i in 1 2 3 ; do echo 4 >/proc/irq/$(awk -F":" "/xhci/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity echo 8 >/proc/irq/$(awk -F":" "/eth0/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity done echo 7 >/sys/class/net/eth0/queues/rx-0/rps_cpus echo 32768 >/proc/sys/net/core/rps_sock_flow_entries echo 32768 >/sys/class/net/eth0/queues/rx-0/rps_flow_cnt exit 0 root@rock64:~# cat /etc/network/interfaces.d/eth0 allow-hotplug eth0 iface eth0 inet dhcp offload-tx off offload-rx off hwaddress ether 42:40:08:16:da:2b The good news: This all is still WiP (work in progress) and I'm pretty confident that we see a performance boost in both directions soon (15-20 MB/s more each -- with Windows Explorer throughput numbers exceeding 100 MB/s should already be possible with appropriate OMV/Samba settings -- see here for an example and here for an explanation why Explorer performs better) 2 Link to comment Share on other sites More sharing options...
tkaiser Posted June 29, 2017 Author Share Posted June 29, 2017 On 23.5.2017 at 4:06 PM, tkaiser said: Here a user reports 'UAS issues': https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=150#p191018 that are obviously none but will just be reported differently after disabling UAS https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=150#p191082 (of course you can't fix USB connection or cable issues with different drivers and also of course we can't blame users for reporting random erratic behaviour as 'UAS issue' since they don't know better, shouldn't be experts and are even victims of an insanely stupid 'UAS is broken' campaign still running over at ODROID forums). And it continues: https://forum.odroid.com/viewtopic.php?f=146&t=26016&start=200#p194654 Another user reports general XU4 USB3 receptacle issues as 'UAS problems' for unknown reasons. The symptoms are exactly those user @Kosmatikreported with his defective Cloudshell 2 cable when blacklisting UAS (see two quoted links above). So if @joaofl would really use UAS he would run in the same errors but this time of course not having reset SuperSpeed USB device number 3 using xhci-hcd in the logs but various occurences of 'uas_eh_abort_handler' and 'uas_zap_pending' (same problem, different driver --> different dmesg output). What do we see happening in is this funny micro community creating their own micro reality over there? A user not using UAS reporting an XU4 problem (USB3 receptacle troubles) gets the recommendation to 'DIsable UAS'. And at the same time the usual suspects still run their insanely stupid 'UAS is broken in Linux everywhere' campaign in ODROID forum. Why I'm mentioning this? Since situation with ROCK64 is very promising also in this area. We're currently in touch with Rockchip engineers since I reported UAS trouble regarding 'ERROR Transfer event for disabled endpoint or incorrect stream ring' --> http://sprunge.us/HURC A Rockchip engineer almost immediately confirmed being able to reproduce (only with RK3328 but not RK3399 -- same xHCI host controller as RK3328 but different USB3 PHY). Currently they're buying USB-to-SATA bridges we recommended for testing and I hope to be able to soon report some progress. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted July 6, 2017 Author Share Posted July 6, 2017 And another small ROCK64/RK3328 update. We're still struggling with some USB problems (RK engineers are aware of and I hope we'll see a fix soon) but since ayufan in the meantime also creates OMV images for ROCK64 automagically I thought let's give it a try again. The settings we came up with the last 3 months here and over in OMV forum are almost all adopted but I ran into some minor glitches with ayufan's build. I had trouble with running rrdcached keeping all cores busy at 100% (had to deactivate it manually), then it seems /usr/local/sbin/rock64_fix_irqs.sh doesn't get executed at boot (this script does the usual tuning Armbian does in /etc/init.d/armhwinfo on all platforms -- a lot of time/efforts went into this over the years) and then also /var/lib/netatalk/CNID isn't handled by folder2ram which negatively affects some of the Netatalk performance numbers below (everything dealing with small files/requests). Also /etc/cron.d/make_nas_processes_faster doesn't work, also manually trying to set I/O scheduling class and priority has no effect (maybe a missing kernel config?). Anyway: executing the /usr/local/sbin/rock64_fix_irqs.sh script increases NAS throughput numbers by 15-20 MB/s and even if not all tunables work right now and we will get 1 or even 2 USB3/UAS fixes it now looks like this (first test made with JMS567, 2nd with ASM1153 powered by ROCK64): Identical performance (every variation below 5% is identical!) and since CNID database currently lives on the rootfs which is on a SD card the numbers for 'create' and 'lock/unlock' will automagically decrease once folder2ram will take care of the CNID databases. With Windows Explorer or macOS Finder sequential transfer speeds exceeding 100 MB/s should already work (see here why/how LanTest test results differ) 1 Link to comment Share on other sites More sharing options...
tkaiser Posted August 25, 2017 Author Share Posted August 25, 2017 Another small update on why settings/details matter. FriendlyELEC's NanoPi M3 isn't the best choice for a NAS since while featuring Gigabit Ethernet (+940 Mbits/sec possible) it has only USB2/Hi-Speed as potential storage interface and all USB2 receptacles are behind an internal USB hub (and have to share bandwidth therefore). So we're talking about 37MB/s max anyway with the MassStorage/BOT protocol and maybe up to 42-43 MB/s if USB Attached SCSI (UAS) would be available. I tested a bit around with NanoPi M3 last year but since storage performance was already way too low (26.5/31 MB/s write/read) I didn't test NAS performance. Since we're in the meantime able to run this board with mainline kernel I gave it a try again: https://forum.armbian.com/index.php?/topic/1285-nanopi-m3-cheap-8-core-35/&do=findComment&comment=38120 Now we get a nice performance increase when testing storage locally: 26.5/31 MB/s write/read back then, now 35/37.5 MB/s (or 8.5/6.5 MB/s more). I had to make a minor correction to our build system to get decent storage performance with our chosen ondemand cpufreq governor too and the other important difference compared to situation with M3's 3.4 kernel is that with the old kernel IRQs are processed on all CPU cores in parallel while we can now choose fixed IRQ affinity (letting interrupts for various stuff be always processed on the same CPU core): https://github.com/armbian/build/blob/4f08c97d745892b6d664b96f8a4f84f48ee1f53f/packages/bsp/common/etc/init.d/armhwinfo#L255-L267 Without that fix it would look like this since all IRQs are processed on cpu0: After fixing IRQ affinity and with the ondemand tweaks active (to get the CPU cores clocking up from 400 MHz to 1400 MHz instantly as soon as IO activity is happening) we're talking about: This is not that impressive as with ROCK64 above where we get 15-20 MB/s more just by taking care of settings but it's a nice little performance boost we get for free (just by taking time and care of settings). What's missing is UAS support, given the Nexell SoC's USB IP can be UAS enabled (requires most probably changes to the USB host controller driver) we would get an additional ~5 MB/s in both directions. Then we would talk about 20/25 MB/s NAS throughput with the old kernel (and active IRQ balancing and inappopriate settings) vs. 38/39 (or even 39/40) MB/s with mainline kernel, fixed/optimized IRQ affinity and UAS. Settings matter 1 Link to comment Share on other sites More sharing options...
Oni Posted September 29, 2017 Share Posted September 29, 2017 Great work with all the benchmarks! I greatly appreciate your job. I don't know if I'm doing it wrong but instead of building the OMV image for my board (Orange pi One) I downloaded the pre-built jessie image from armbian website and run the customize-image.sh.template: wget https://raw.githubusercontent.com/armbian/build/master/config/templates/customize-image.sh.template mv customize-image.sh.template omv.sh nano omv.sh # uncomment InstallOpenMediaVault chmod 700 omv.sh ./omv.sh jessie sun8i orangepione no Is it equivalent to building the whole image from ground up? By the way, the above process fails because the space in the sd card becomes full. 1 Link to comment Share on other sites More sharing options...
Igor Posted September 29, 2017 Share Posted September 29, 2017 51 minutes ago, Oni said: By the way, the above process fails because the space in the sd card becomes full. There is some warning at first login that you need to reboot ... to expand SD card too full size. Is this the problem? 1 Link to comment Share on other sites More sharing options...
tkaiser Posted September 30, 2017 Author Share Posted September 30, 2017 10 hours ago, Oni said: I don't know if I'm doing it wrong but instead of building the OMV image for my board (Orange pi One) I downloaded the pre-built jessie image from armbian website and run the customize-image.sh.template This is wrong, yes. Not expected to work since customize-image.sh has to be run as part of the build process. If you want to install OMV later you would today download softy, execute it and choose 'Install OMV': wget https://raw.githubusercontent.com/armbian/config/dev/softy /bin/bash softy In a few weeks this is all part of next major Armbian release, then it's simply: armbian-config --> software --> Install OMV (works today also but installation is not optimized so today better use the wget workaround above) 1 Link to comment Share on other sites More sharing options...
Oni Posted October 7, 2017 Share Posted October 7, 2017 On 30/9/2017 at 9:07 AM, tkaiser said: This is wrong, yes. Not expected to work since customize-image.sh has to be run as part of the build process. If you want to install OMV later you would today download softy, execute it and choose 'Install OMV': wget https://raw.githubusercontent.com/armbian/config/dev/softy /bin/bash softy In a few weeks this is all part of next major Armbian release, then it's simply: armbian-config --> software --> Install OMV (works today also but installation is not optimized so today better use the wget workaround above) Softy worked perfectly! 1 Link to comment Share on other sites More sharing options...
Igor Posted November 4, 2017 Share Posted November 4, 2017 # The loopback network interface auto lo iface lo inet loopback # eth0 network interface auto eth0 allow-hotplug eth0 iface eth0 inet static address 172.16.100.190 gateway 172.16.100.1 netmask 255.255.255.0 dns-nameservers 172.16.100.1 8.8.8.8 84.255.209.79 iface eth0 inet6 manual pre-down ip -6 addr flush dev $IFACE @tkaiser https://sourceforge.net/projects/openmediavault/files/Odroid images/ Broken interfaces file after 1st reboot. Not sure if this is OMV or Armbian problem? Link to comment Share on other sites More sharing options...
tkaiser Posted November 4, 2017 Author Share Posted November 4, 2017 3 minutes ago, Igor said: Broken interfaces file after 1st reboot. Not sure if this is OMV or Armbian problem? My OMV installation routine was at fault (cause/fix), it only affects installations with IPv6 propagated through DHCP (that's why I did not notice since my lab is IPv4 only) and is fixed at least on the XU4/HC1 image in the meantime. Most probably only C2 image affected any more (and maybe C1 image). Still waiting for C2 being usable with btrfs and next kernel in general. Then I'll regenerate the image. In the meantime affected users should take care that IPv6 is temporarely disabled when they boot their OMV image the first time. 1 Link to comment Share on other sites More sharing options...
tkaiser Posted November 16, 2017 Author Share Posted November 16, 2017 After 7 months of work on this now let's close the chapter. Today I generated the last OMV image based on Armbian for a little nice guy who arrived yesterday (since the board followed the usual conservative Xunlong hardware development style it was easy to add support within 24 hours: half an hour on generating the wiki page, some basic benchmarks to get behaviour/performance with BSP, editing the wiki page a bit. Now it's added to the build system and only remaining issues are Wi-Fi driver -- me not interested in at all -- and DVFS / voltage regulation to kick this little guy up to 1.2 GHz cpufreq). I waited for this test over 9 months now (Orange Pi NAS Expansion board arriving and yesterday finally the GbE enabled companion). Prerequisits: Xunlong's NAS Expansion board needs firmware updates first for top performance (though still USB2 only so we're talking about slightly above 40MB/s as maximum here!) Since the application in question needs some fault tolerance a redundant RAID mode has been set as mandatory. While I really hate the combination of RAID-1 with incapable filesystem I chose RAID-10 with just 2 USB disks and far layout with a btrfs on top (btrfs then provides data integrity, snapshots and somehow also backup functionality) Since the average data on this appliance will benefit from transparent file compression (1:1.4) btrfs is used with 'compress=lzo' for the below benchmarks (which is kinda stupid since benchmarks using highly compressable test data will perform totally different than real-world data. But I was looking for worst case scenario and thermal issues now) So I let the build system generate an OMV image for Orange Pi Zero Plus automatically, booted it and adjusted only one value: max cpufreq has been lowered to 912 MHz to ensure operation at the lowest consumption/thermal level (keeping VDD_CPUX all the time at 1.1V) Such a RAID10 accross 2 USB2 disks is limited to ~40MB/s write and 80MB/s read with cheap Allwinner boards if they can run mainline kernel to make use of UAS (USB Attached SCSI). Network performance of a H5 board with 912 MHz is limited at around 750/940 MBits/sec and then there's also some compression related CPU utilization when making use of transparent file compression. This is a standard LanTest on this construct (with real-world data the write performance would be a lot lower but for my use cases this doesn't matter that much): Now I pulled the plug of one of the 2 disks to turn the RAID10 into a degraded RAID10 without any redundancy any more: As we can see performance is not affected at all but that's just the result of using a benchmarking setup that's not focused on highest storage performance but on thermal performance (with real-world data read performance would've dropped significantly). The H5 on OPi Zero Plus wears a 16x16mm heatsink with sufficient airflow around. Ambient temperature is at ~24°C, H5 idles at 40°C. With the above benchmark/settings maximum SoC temperature reported was 62°C when H5 clocked all the time at 912 MHz. Now let's look at a RAID resync. I attached another disk appearing as /dev/sdc now, added it to the RAID10 and immediately the resync started to regain full redundancy: (with real-world data also write performance would've dropped dramatically but I was interested in thermal results and the whole time during the rebuild while benchmarks were running H5 reported 60°C max). Due to ARMv8 AES crypto extensions available H5 should be able to serve as VPN endpoint too without sacrifying above performance so it really looks promising for those small boards being used in such an appliance (replacing some +40TB NAS devices again ) Some technical details: RAID10 has been created as in the link above, the btrfs has been created with defaults, only change: adding 'compress=lzo' to mount options. After removing one disk, adding the new one and starting the resync with the benchmark running in parallel the array looked like this: root@orangepizeroplus:/srv# mdadm --manage /dev/md127 -a /dev/sdc mdadm: added /dev/sdc root@orangepizeroplus:/srv# cat /proc/mdstat Personalities : [raid10] md127 : active raid10 sdc[2] sda[0] sdb[1](F) 117155288 blocks super 1.2 4K chunks 2 far-copies [2/1] [U_] [>....................] recovery = 0.1% (191092/117155288) finish=51.0min speed=38218K/sec bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> root@orangepizeroplus:/srv# mdadm -D /dev/md127 /dev/md127: Version : 1.2 Creation Time : Thu Nov 16 17:43:32 2017 Raid Level : raid10 Array Size : 117155288 (111.73 GiB 119.97 GB) Used Dev Size : 117155288 (111.73 GiB 119.97 GB) Raid Devices : 2 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu Nov 16 21:16:51 2017 State : clean, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Layout : far=2 Chunk Size : 4K Rebuild Status : 0% complete Name : orangepizeroplus:127 (local to host orangepizeroplus) UUID : 1eff01f2:c42fc407:09b73154:397985d4 Events : 1087 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 2 8 32 1 spare rebuilding /dev/sdc 1 8 16 - faulty root@orangepizeroplus:/srv# cat /proc/mdstat Personalities : [raid10] md127 : active raid10 sdc[2] sda[0] sdb[1](F) 117155288 blocks super 1.2 4K chunks 2 far-copies [2/1] [U_] [=>...................] recovery = 7.6% (8905052/117155288) finish=106.7min speed=16898K/sec bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> Without heavy disk activity in parallel the resync performs between 30 and 39 MB/s. Link to comment Share on other sites More sharing options...
trohn_javolta Posted November 17, 2017 Share Posted November 17, 2017 2 hours ago, tkaiser said: After 7 months of work on this now let's close the chapter. Oh no, I hope this doesn't mean you completely stop doing your reviews alltogether I'm waiting for the upcoming odroid HC2 and wanted to make a purchase dependent on a review of yours including its nas capabilities. I also plan to use an omv armbian image with possible tweaks of yours... I guess I'll have to use the existing one for xu4/hc1/mc1 and hope there's not significant changes in upcoming odroid HC2 1 Link to comment Share on other sites More sharing options...
t-minik Posted November 17, 2017 Share Posted November 17, 2017 20 hours ago, tkaiser said: After 7 months of work on this now let's close the chapter. Hi tkaiser. I just seen that topic since your last post. do you mean OMV support will be stopped or just you won't provide more bench as you did before ? i'm just trying a self compiled OMV build for OpiPC+ and it seem to work flawlessly ! i'm now compiling an odroid XU4 for my friend, HC1 owner. Link to comment Share on other sites More sharing options...
tkaiser Posted November 20, 2017 Author Share Posted November 20, 2017 (edited) On 17.11.2017 at 2:20 AM, trohn_javolta said: hope there's not significant changes in upcoming odroid HC2 Well, I expect the HC2 (or HC1+ or whatever HC$something will be released by Hardkernel the next time) to be fully software compatible. It will be just another variant using the same SoC and a JMicron USB-to-SATA bridge (maybe just like with their Cloudshell 2 using JMS561 which will then lead to the same issues as today with this Cloudshell thing). The thread title there has a reason On 17.11.2017 at 7:59 PM, t-minik said: do you mean OMV support will be stopped No, since all the results of our work are still there (we don't fiddle around manually with OS images but each and every improvement gets commited to the build system so everyone on this planet is enabled to build fresh OMV images from scratch as often as he wants to). It's just that there's currently nothing to improve and it also makes no sense to provide more OMV images since every Debian based Armbian image using kernel 4.x can be transformed into an OMV installation with 'armbian-config --> Software --> Softy --> Install OMV' (takes care about almost all the performance tweaks our OMV images contain). To finalize this a summary what 'Armbian + OMV' exactly means and what the result of this 7 month journey was. It was about identifying problems (both technical and user experience) and developing optimal settings for performance and energy efficiency. And now we're there. So if you want to run OMV on an SBC simply check whether the board is supported by Armbian with a recent kernel and you're almost done. If you want to do yourself a favour then click here on GbE (to avoid boards that are bottlenecked by slow networking) and have in mind that some stuff that looks nice on paper like Allwinner's 'native SATA' performs pretty poor in reality (check SBC storage performance overview) While you basically can turn every SBC Debian Jessie or Stretch OS image into either an OMV 3 or 4 installation the real differences are as follows: 1) Armbian as base: we care about kernel support, providing for all relevant platforms pretty recent kernel versions that enable all the features needed by OMV. Difference to other distros or OS images: they often use horribly outdated kernels that lack features needed for OMV working properly (please compare with the OMV related kernel config changes) our OS images contain several performance and/or consumption tweaks, eg. we optimize IO scheduler based on type of storage, we take care about IRQ affinity (on almost all other SBC distros all interrupts are processed on the first CPU core which will result in lower performance) and optimized cpufreq governor settings (allowing the board to idle at minimal consumption but switching to highest performance immediately if needed) We try to use modern protocols, eg. enabling 'USB Attached SCSI' (UAS) where possible while taking care also of broken USB enclosures that get blacklisted automatically. UAS allows for higher USB storage performance with less CPU utilization at the same time. Armbian contains powerful support tools that allow to remotely diagnose problems very easily (only problem: here in the forum we play mind-readers instead of asking for armbianmonitor output all the time) 2) Armbian's OMV/NAS performance/reliability tweaks: we use improved Samba settings to increase SMB performance especially on the weaker boards Several file sharing daemons that usually store caches on the rootfs are forced to use RAM instead (heavily increases performance in some areas and also helps with SD cards wearing out too fast) Enabling driver support for the only two USB3 GbE dongles (ASIX AX88179 and the better RealTek RTL8153) so even boards with only Fast Ethernet or without Ethernet can be used as NAS with those dongles 3) Armbian's OMV integration tweaks: we take care that we set in /etc/default/openmediavault three variables that heavily influence board performance once a user clicks around in OMV's 'Power Management' settings. Without this tweak otherwise OMV defines 'powersave' cpufreq governor which is totally fine on x86 systems but can result in a horrible performance decrease on ARM boards with kernels that then remain all the time on the lowest possible CPU frequency (on some systems like ODROID-XU4 or HC1 this can make a difference of 200 MHz vs. 2 GHz!) we install and enable OMV's flashmemory plugin by default to reduce wear on the SD card and to speed certain things up. WIthout this plugin OMV installations running off SD cards or eMMC might pretty fast fail due to flash media already worn out (way higher write amplification without the plugin will lead to your storage media dying much faster) All these 9 tweaks above make the difference and are responsible for such an 'Armbian OMV' consuming less energy while performing a lot better than distros that ignore all of this. I tested the last months a lot also with other OS images where OMV has been installed without any tweaks and Armbian's way was always way faster (biggest difference on ODROID-XU4 where I tested with OS images that showed not even 50% of our performance). That being said it's really as easy as: Choose a sufficient board (GbE, fast storage) supported by Armbian's next branch (so kernel version recent enough for all features to work), choose either Jessie (for OMV 3) or Stretch (OMV 4) and call armbian-config to let OMV be installed (few minutes on a fast SD card, if you have to wait ages it's highly recommended to throw the SD card away and start over from here). Just as a reference: the dedicated OMV images I generated the last half year do include another bunch of minor tweaks compared to installing OMV with armbian-config: Disable Armbian's log2ram in favour of OMV's folder2ram Automatically setting the correct timezone at first boot based on geo location (IP address) Device support for Cloudshell 2 (checks presence on I2C bus and then install's Hardkernel's support stuff to get display and fan working) Device workaround for buggy Cloudshell 1 (checks presence on USB bus) Cron job executed every minute to improve IO snappyness of filesharing daemons and moving them to the big cores on ODROID XU4 Making syslog less noisy via /etc/rsyslog.d/omv-armbian.conf Replacing swap entirely with zram Limit rootfs resize to 7.3 GB and automatically creating a 3rd partition using the remaining capacity that only needs to be formatted manually and can then be used as a data share All tweaks can be studied here and by using this script as customize-image.sh you can still build fully optimized OMV images for every board Armbian supports with a next kernel branch. And now may I ask a moderator to lock this thread from now on so new issues will be discussed separately? Thank you! Edited November 20, 2017 by chwe done :) 3 Link to comment Share on other sites More sharing options...
tkaiser Posted November 21, 2017 Author Share Posted November 21, 2017 After latest armbian-config updates the following will now also be tweaked if OMV is installed this way: Disable Armbian's log2ram in favour of OMV's folder2ram Device workaround for Cloudshell 1 (checks presence on USB bus -- for this to work the Cloudshell 1 must be connected when installing OMV!) Device support for Cloudshell 2 (checks presence on I2C bus and then install's Hardkernel's support stuff to get display and fan working -- for this to work the Cloudshell 2 must be connected when installing OMV!)) Cron job executed every minute to improve IO snappyness of filesharing daemons and moving them to the big cores on ODROID XU4 Making syslog less noisy via /etc/rsyslog.d/omv-armbian.conf So only remaining difference to the 'dedicated' OMV images is now the following: The dedicated OMV images set the correct timezone at first boot replace swap entirely with zram limit rootfs resize to 7.3 GB and automatically create a 3rd partition BTW: There is currently no OMV uninstall routine and there will most probably never be one. While you could probably succeed with an 'apt purge openmediavault' this is also not recommended since too many leftovers will remain. If for whatever reasons after installing OMV you want to stop using it it's strongly recommended to start over from scratch with a freshly burned new image. 2 Link to comment Share on other sites More sharing options...
Recommended Posts