Jump to content

Recommended Posts


Since I've seen some really weird disk/IO benchmarks made on SBCs the last days and both a new SBC and a new SSD arrived in the meantime I thought let's give it a try with a slightly better test setup.

I tested with 4 different SoC/SBC: NanoPi M3 with an octa-core S5P6818 Samsung/Nexell SoC, ODROID-C2 featuring a quad-core Amlogic S905, Orange Pi PC with a quad-core Allwinner H3 and an old Banana Pi Pro with a dual-core A20. The device considered the slowest (dual-core A20 with just 960 MHz clockspeed) is the fastest in reality when it's about disk IO.
Since most if not all storage 'benchmarks' for SBC moronically focus on sequential transfer speeds only and completely forget that random IO is way more important on any SBC (it's not a digital camera or video recorder!) I tested this also. Since it's also somewhat moronically when you want to test the storage implementation of a computer to choose a disk that is limited the main test device is a brand new Samsung SSD 750 EVO 120GB I tested first on a PC whether the SSD is ok and to get a baseline what to expect.
Since NanoPi M3, ODROID-C2 and Orange Pi PC only feature USB 2.0 I tested with 2 different USB enclosures that are known to be USB Attached SCSI (UAS) capable. The nice thing with UAS is that while it's an optional USB feature that came together with USB 3.0 we can use it with more recent sunxi SoCs also when running mainline kernel (A20, H3, A64 -- all only USB 2.0 capable).
When clicking on the link you can also see how different USB enclosures (to be more precise: the included USB-to-SATA bridges used) perform. Keep that in mind when you see 'disk performance' numbers somewhere and people write SBC A would be 2MB/s faster than SBC B -- for the variation in numbers not only the SBC might be responsible but this is for sure also influenced by both the disk used and enclosure / USB-SATA bridge inside! The same applies to the kernel the SBC is running. So never trust in any numbers you find on the internet that are the results of tests at different times, with different disks or different enclosures. The numbers presented are just BS.
The two enclosures I tested with are equipped with JMicron JMS567 and ASMedia ASM1153. With sunxi SBCs running mainline kernel UAS will be used, with other SoCs/SBC or running legacy kernels it will be USB Mass Storage instead. Banana Pi Pro is an exception since its SoC features true SATA (with limited sequential write speeds) which will outperform every USB implementation. And I also used a rather fast SD card and also a normal HDD with this device connected to USB with a non UASP capable disk enclosure to show how badly this affects the important performance factors (again: random IO!)
I used iozone with 3 different runs:
  • 1 MB test size with 1k, 2k and 4k record sizes
  • 100 MB test size with 4k, 16k, 512k, 1024k and 16384k (16 MB) record sizes
  • 4 GB test size with 4k and 1024k record sizes
The variation in results is interesting. If 4K results between 1 and 100 MB test size differ you know that your benchmark is not testing disk througput but instead the (pretty small) disk cache. Using 4GB for sequential transfer speeds ensures that the whole amount of data exceeds DRAM size.
The results:
NanoPi M3 @ 1400 MHz / 3.4.39-s5p6818 / jessie / USB Mass Storage:
Sequential transfer speeds with USB: 30MB/s with 1MB record size and just 7.5MB/s at 4K/100MB, lowest random IO numbers of all. All USB ports are behind an USB hub and it's already known that performance on the USB OTG port is higher. Unfortunately my SSD with both enclosures prevented negotiating an USB connection on the OTG port since each time I connected the SSD the following happened: WARN::dwc_otg_hcd_hub_control:2544: Overcurrent change detected )
  Reveal hidden contents
JMicron JMS567                                                random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2396     2589     2616     2665     2666     2657
            1024       2     4397     5033     5101     5323     5334     5296
            1024       4     7454     7138     7495     8000     8000     7924
          102400       4     7063     7476     7531     7570     7536     7573
          102400      16    15812    17276    20397    20326    20421    16990
          102400     512    25465    25454    29117    28545    29114    25501
          102400    1024    25843    25401    29048    29279    29420    25899
          102400   16384    26592    26600    31280    31306    30841    26472
         4096000       4    28107    28145    29994    29795
         4096000    1024    29253    29578    31328    31123

ASMedia ASM1153                                               random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2281     2600     2594     2659     2663     2642
            1024       2     4171     5047     5050     5299     4411     5236
            1024       4     6929     7386     7381     7969     7985     7782
          102400       4     7568     7968     7995     7999     7985     7993
          102400      16    15939    18160    21231    21294    18241    18176
          102400     512    26865    26985    29784    29873    29609    26958
          102400    1024    27191    27273    30211    30290    30222    27271
          102400   16384    28427    28479    32559    32655    32676    28533
         4096000       4    29275    29483    31404    31175
         4096000    1024    29182    29500    31392    31178
ODROID-C2 @ 1536 MHz / 3.14.74-odroidc2 / xenial / USB Mass Storage:
Sequential transfer speeds with USB: ~39MB/s with 1MB record size and ~10.5MB/s at 4K/100MB. All USB ports are behind an USB hub and the performance numbers look like there's always some buffering involved (not true disk test but kernel's caches involved partially)
  Reveal hidden contents
JMicron JMS567                                                random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2630     2562     2623     2665     2665     2658
            1024       2     5177     4884     5109     5328     5330     5305
            1024       4     9868     9352    10188    10636    10662    10543
          102400       4    10639    10633    10647    10656    10658    10649
          102400      16    21288    21246    21288    21327    21326    21319
          102400     512    37439    37382    40035    40134    39815    37461
          102400    1024    38051    38081    40874    40894    40773    38235
          102400   16384    38512    38401    41363    41346    41298    38542
         4096000       4    42327    42383    37781    37792                
         4096000    1024    42387    42563    37811    37743
ASMedia ASM1153                                               random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2592     2564     2609     2662     2663     2643
            1024       2     5147     4899     5055     5318     5321     5250
            1024       4     9454     9397     9987    10596    10631    10319
          102400       4    10634    10634    10646    10655    10656    10651
          102400      16    21292    21248    21279    21324    21326    21307
          102400     512    37118    37117    38804    38851    37478    37045
          102400    1024    37782    37944    39151    39277    37829    37927
          102400   16384    38062    37957    39299    39360    39405    38061
         4096000       4    42186    42465    36276    36251                
         4096000    1024    41990    42020    36177    36174                
Orange Pi PC @ 1296 MHz / 4.7.2-sun8i / xenial / UAS:
Sequential transfer speeds with USB: ~40MB/s with 1MB record size and ~9MB/s at 4K/100MB, best random IO with very small files. All USB ports are independant (just like on Orange Pi Plus 2E where identical results will be achieved since same SoC and same settings when running Armbian)
  Reveal hidden contents
JMicron JMS567                                                random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2965     3670     3898     3955     2655     3843
            1024       2     4953     7366     7175     7817     5326     7545
            1024       4     8606     9394     9968    10370     7989    10168
          102400       4     8859    10293    10622    10644     7996     8645
          102400      16    22642    24860    24839    22971    21334    22089
          102400     512    37057    34611    40005    40184    39697    35039
          102400    1024    37555    37682    40681    40739    40398    37713
          102400   16384    36809    38030    41050    41183    41172    38063
         4096000       4    39228    39266    40931    40941
         4096000    1024    38889    39037    40939    40950                

ASMedia ASM1153                                               random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     2431     2933     3079     3063     2664     2962
            1024       2     4395     5716     5926     6262     5326     6007
            1024       4     7442     7937     8351     8762     7990     8146
          102400       4     7976     8352     7993     8000     7976     8060
          102400      16    21294    21838    22744    21874    21321    21576
          102400     512    36848    34647    39241    39386    38959    34878
          102400    1024    37451    37531    40050    40248    39940    37685
          102400   16384    36937    38107    40884    41145    41138    38181
         4096000       4    39124    39329    39994    39975
         4096000    1024    39122    39179    39884    39792                
Banana Pi Pro @ 960 MHz / 4.6.3-sunxi / xenial / SATA-SSD vs. USB-HDD:
This test setup is totally different since the SSD will be connected through SATA and I use a normal HDD in an UAS incapable disk enclosure to show how huge the performance differences are.
SATA sequential transfer speeds are unbalanced for still unknown reasons: write/read ~40/170MB/s with 1MB record size, 16/44MB/s with 4K/100MB (that's huge compared to all the USB numbers above!). Best random IO numbers (magnitudes faster since no USB-to-SATA bottleneck as with every USB disk is present).
The HDD test shows the worst numbers: Just 29MB/s sequential speeds at 1MB record size and only ~5MB/s with 4K/100MB. Also the huge difference between the tests with 1MB vs. 100MB data size with 4K record size shows clearly that with 1MB test size only the HDD's internal DRAM cache has been tested (no disk involved): this was not a disk test but a disk cache test only.
  Reveal hidden contents
SSD on SATA port                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     5416     7880    12172    12603     7071     7808
            1024       2     8876    12552    20764    21865    13054    12337
            1024       4    14467    18387    37819    42846    26363    17098
          102400       4    14932    19301    42873    45622    24953    19840
          102400      16    27841    31117   103168   103871    73178    31151
          102400     512    38764    38931   188829   189697   175944    38861
          102400    1024    39369    39437   207168   208312   199500    39406
          102400   16384    39922    39889   217207   218838   218113    40048
         4096000       4    40185    40168   181351   183561                
         4096000    1024    39714    39707   162229   162155                

HDD on USB 2.0 port                                           random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1     1557     1718     1914     1682     1790     1782
            1024       2     2553     3796     3035     3577     3995     3218
            1024       4     5091     4287     4590     6395     6384     4208
          102400       4     5019     5782     5829     5798      447     1172
          102400      16    14832    16491    15018    15632     1716     4115
          102400     512    30147    30755    29154    29686    20518    29839
          102400    1024    30665    31523    28922    29804    17880    30775
          102400   16384    32828    32296    30602    30819    29758    33069
         4096000       4    32263    32633    26844    26893                
         4096000    1024    32010    32418    26609    26736                

Samsung Pro 64 GB SD card                                     random    random
              kB  reclen    write  rewrite    read    reread    read     write
            1024       1      884      942     2684     2723     2695      987
            1024       2     1494     1831     4636     4685     4748     1885
            1024       4     3158     3273     8050     7617     7849     3097
          102400       4     2617     2818     7848     7818     7812     2272
          102400      16     6555     5656    13380    13396    13329     4932
          102400     512    20756    20904    21966    21971    21969    20983
          102400    1024    21415    21558    22204    22207    22200    21520
          102400   16384    21768    21836    22418    22418    22413    21857
         4096000       4    22922    22692    21888    21901                
         4096000    1024    22995    22593    21132    21110                
Lessons to learn?
  • HDDs are slow. Even that slow that they are the bottleneck and invalidate every performance test when you want to test the performance of the host (the SBC in question)
  • With HDDs data size matters since you get different results depending on whether the benchmark runs inside the HDD's internal caches or not. SSDs behave here differently since they do not contain ultra-slow rotating platters but their different types of internal storage (DRAM cache and flash) do not perform that different
  • When you have both USB and SATA not using the latter is almost all the time simply stupid (even if sequential write performance looks identical. Sequential read speeds are way higher, random IO will always be superiour and this is more important)
  • It always depends on the use case in question. Imagine you want to set up a lightweight web server dealing with static contents on any SBC that features only USB. Most of the accessed files are rather small especially when you configure your web server to deliver all content already pre-compressed. So if you compare random reads with 4k and 16k record size and 100MB data size you'll notice that a good SD card will perform magnitudes faster! For small files (4k) it's ~110 IOPS (447 KB/s) vs. 1950 IOPS (7812 KB/s) so SD card is ~18 times faster, for 16k size it's ~110 IOPS (1716 KB/s) vs. 830 IOPS (13329 KB/s) so SD card is still 7.5 times faster than USB disk. File size has to reach 512K to let USB disk perform as good as the SD card! Please note that I used a Samsung Pro 64GB for this test. The cheaper EVO/EVO+ with 32 and 64GB show identical sequential transfer speeds while being a lot faster when it's about random IO with small files. So you save money and get better performance by choosing the cards that look worse by specs!
  • Record size always matters. Most fs accesses on an SBC are not large data that will be streamed but small chunks of randomly read/written data. Therefore check random IO results with small record sizes since this is important and have a look at the comparison of 1MB vs. 100 MB data size to get the idea when you're only testing your disk's caches and when your disk in reality.
If you compare random IO numbers from crap SD cards (Kingston, noname, Verbatim, noname, PNY, noname, Intenso, noname and so on) with the results above then even the slow HDD connected through USB can shine. But better SD cards exist and some pretty fast eMMC implementations on some boards (ODROID-C2 being the best performer here). By comparing with the SSD results you get the idea how to improve performance when your workload depends on that (desktop Linux, web server, database server). Even a simple 'apt-get upgrade' when done after months without upgrades heavily depends on fast random IO (especially writes).
So by relying on the usual bullshit benchmarks only showing sequential transfer speeds a HDD (30 MB/s) and a SD card (23 MB/s) seem to perform nearly identical while in reality the way more important random IO performance might differ a lot. And this solely depends on the SD card you bought and not on the SBC you use! For many server use cases when small file accesses happen good SD cards or eMMC will be magnitudes faster than HDDs (again, it's mostly about random IO and not sequential transfer speeds).
I personally used/tested SD cards that show only 37 KB/s when running the 16K random write test (some cheap Intenso crap). Compared to the same test when combining A20 with a SATA SSD this is 'just' over 800 times slower (31000 KB/s). Compared to the best performer we currently know (EVO/EVO+ with 32/64GB) this is still 325 times slower (12000 KB/s). And this speed difference (again: random IO) will be responsible for an 'apt-get upgrade' with 200 packages on the Intenso card taking hours while finishing in less than a minute on the SATA disk and in 2 minutes with the good Samsung cards given your Internet connection is fast enough.

"some pretty fast eMMC implementations on some boards (ODROID-C2 being the best performer here)"

Odroid eMMC performances are very impressive, but also are their price : quite the price of an SSD !

The eMMC of my BPI m2+ is quite good at small IO and bad for large. I suppose the controller inside the chip is optimized to do so. And it is what I need anyway for a system disk : the performance while handling numerous small files

                                                    random    random
    kB  reclen    write  rewrite    read    reread    read     write
102400       4     4753     5525   15134     15134   12817      3385
102400     512     9794     7042   61619     61412   61511      5389
102400   16384     7093     7712   77026     77053   52573     10426

Question : can you tell me how much can one expect with 4K random read threw network with an SSD/SATA/fast or USB ethernet/good Kernel ? The target use case is diskless clients.

  On 8/31/2016 at 3:55 PM, arox said:

Odroid eMMC performances are very impressive, but also are their price : quite the price of an SSD !


Well, but cheap SSDs show less performance than these eMMC modules and since we're solely talking about SBC here the question remains how to connect an SSD? If you've an A20 device you're lucky since you can SATA, if you've to rely on USB even fast SD cards might easily outperform any USB connected SSD as long it's about small file sizes and random IO.


Regarding BPi M2+ -- it seems the eMMC they now use is way slower than before (I've PCB revision 1.0, maybe they used better eMMC there to get better reviews?). Please compare with post #45 here.


And please open a separate thread regarding diskless clients since this is a totally unrelated subject (at least based on my experiences so far key to success is low latency and caching as much as possible on the client side)

  On 8/31/2016 at 4:29 PM, tkaiser said:

Well, but cheap SSDs show less performance than these eMMC modules and since we're solely talking about SBC here the question remains how to connect an SSD? If you've an A20 device you're lucky since you can SATA

I have a Sandisk 64 GB SSD (quiet cheap, good transfer rates but power consumption a bit high in comparison to Sammsung SSDs). Installed on an Intel Atom 1.6 GHz with 4.4.6 kernel (gentoo optimized), it makes anyway SD, eMMC, HDD ridiculous :

                                                    random    random
    kB  reclen    write  rewrite    read    reread    read     write
102400       4    30645    36430   44139     44268   16871     13850
102400     512   152591   147527  240342    255497  201405    109911
102400   16384   164887   184238  266631    266543  264865    179944

(I need to compile nfs in order to benchmark NFS or iSCSI solutions)


BTW 1) Is there a solution to connect 2 SATA drives on one port ?


BTW 2) My eMMC on BPI m2+ shows:

Disk /dev/mmcblk0: 7.3 GiB, 7818182656 bytes, 15269888 sectors

If yours with PCB 1.0 is really 8 GiB, it is a clue that they effectively made some cost "optimization" ...

  On 8/31/2016 at 10:02 PM, arox said:

I have a Sandisk 64 GB SSD (quiet cheap, good transfer rates but power consumption a bit high in comparison to Sammsung SSDs). Installed on an Intel Atom 1.6 GHz with 4.4.6 kernel (gentoo optimized), it makes anyway SD, eMMC, HDD ridiculous


Nope, not really. The mode how iozone displays results might be confusing. Random IO is about IOPS (input/output operations per second), iozone shows KB/s instead so you have to calculate yourself dividing the KB/s you get through the record size:

  • 13850 KB/s at 4k record size: 3462 IOPS writing
  • 109911 KB/s at 512k record size: 215 IOPS writing
  • 179944 KB/s at 16384k record size: 11 IOPS writing

The higher the record sizes grow the more the sequential transfer speed limits tamper random IO (see your read results: they're the same for random and sequential reads at 16MB record size) and also there's no use case for it (at least I know no application that writes/reads chunks of 16 MB randomly to disk, as soon as chunks get that large we're talking about 'streaming' use cases and then it's sequential performance and with HDDs we would start to take care of disk fragmentation).


Taking the use case into consideration then testing random IO is useful for smaller record sizes (1 - 128 KB). And then please have a look how the smallest (slowest) eMMC module from Hardkernel in an SD card adapter on an old slow boring Banana Pi performs (value almost as fast as your SSD but BPi is faster in reality -- see below):

8 GB used with Banana Pi                            random    random
    kB  reclen    write  rewrite    read    reread    read     write
102400       4    11485    12921    7044      7060    7010     11957

And these are the 8GB and the 32 GB eMMC tested on ODROID-C2 itself:

8 GB                                                random    random
    kB  reclen    write  rewrite    read    reread    read     write
102400       4    21526    21546     9848     9788     9547    21010       
102400     512    43325    43320   108983   109771   109912    42450
102400   16384    42912    42980   107567   107679   107656    42876

32 GB
102400       4    21506    21871    10587    10589    10285    21568
102400     512   119884   119129   120658   120204   120285   117944
102400   16384   123984   124286   118559   118461   118260   121185

In all cases sequential and random results are the same, that means the benchmark doesn't tell the real thruth since random IO measurements are bottlenecked by transfer speeds (applies especially to the test with ODROID's eMMC in an SD card adapter at Banana Pro since this really suffers from a slow SDIO implementation)! That means with our test setup we did not test the eMMC/SD-card individually but tested the host's interface between SoC and storage too.

If you get these numbers in passive benchmarking mode you would now simply stop, create graphs and make a blog post ('benchmarking gone wrong' as usual)


In active benchmarking mode you try to understand what's happening: Random IO not visible due to test file size too large: We need more tests with lower file sizes. The random IO results above do not reflect IOPS possible since even at 4K record size already sequential and random numbers are identical. So we need to decrease the record sizes too (1k, 2k, 4k). Also the 32 GB eMMC does not show higher random IO values, they look just better since sequential transfer speed is higher (more parallelism on the larger eMMC modules) and random IO gets bottlenecked with higher numbers (but still 100% bottlenecked).


And the results we would get from a new round of tests we have to set in a relationship with reality (benchmarks per se are useless). If you measure for example that your eMMC above shows 21568 KB/s at 4K (~5400 IOPS, that's already way more than your cheap SSD!) and then repeat the test with 2K and 1K and now get 15000KB/s (7500 IOPS at 2K) and 12000KB/s at 1K (here IOPS and KB/s the same for obvious reasons) then you see the real potential of the eMMC. Does this matter for real world situations? Only if we remain in active benchmarking mode.


If you know that your storage implementation jumps from 5400 IOPS at 4K to 12000 IOPS at 1K and you use a filesystem with 4K blocksize, well then...


The whole stuff is nothing for the average user. But stuff like this is something for us as Armbian community/project: raising awareness for the issues that really matter (that means constantly repeating that 99% of all benchmark numbers you find on the internet are crap, if it's about SBCs it gets close to 100%). And coming up with better defaults and enabling our users to choose the optimal devices for their use case. That's IMO the biggest advantage of Armbian: It enables you to choose between a huge variety of different SBC without having to fear that software sucks. So you choose the device that perfectly fits your needs.


On a beefy server you can throw in more hardware if you realize that you're bottlenecked. That's different on these SBCs, there you have to choose wisely the combination of device and peripherals, then the correct settings having always the use case in mind. 


Now a real world example showing where you end up with the usual 'benchmarking gone wrong' approach. Imagine you need to set up a web server for static contents only. 100 GB of pure text files (not that realistic but just to show you how important it is to look closer). The server will be behind a leased line limited to 100 Mbits/sec. Which SBC to choose?


The usual benchmark approaches tell you to measure sequential transfer speeds and nothing else (which is OK for streaming use cases when DVD images several GB each in size are provided but which is absolutely useless when we're talking about accessing 100 GB of small files in a random fashion -- the 'web server use case'). Then the usual benchmark tells you to measure throughput with iperf (web serving small files is about latency, that's quite the opposite) and some silly moronic stuff to measure how fast your web server is using the loopback interface of your server and test tool on the same machine not testing network at all (how does that translate to any real world web server usage? Exactly: not at all).


If we rely on the passive benchmarking numbers and have in mind that we have to serve 100 GB at a reasonable cost we end up thinking about an externally connected HDD and a board with GbE (since iperf numbers look many times faster than Fast Ethernet) and a board that shows the highest page request numbers testing on the local machine (when the whole 'benchmark' turns into a multi-threaded CPU test and has nothing to do with web serving at all). Please don't laugh, but that's how usual SBC comparisons deal with this.


So you choose from the list above as storage implementation an external 500 GB HDD since USB performance looks ok-ish with all boards  (+30 MB/s), and NanoPi M3 since iperf numbers look nice (GbE) and most importantly it will perform the best on the loopback interface since it has the most and the fastest CPU cores.


This way you end up with a really slow implementation since accessing files is more or less only random IO. The usual 2.5" notebook HDD you use on the USB port achieves less than 100 IOPS (see above result for USB HDD on Banana Pro with UASP incapable enclosure). By looking at iperf performance on the GbE interface you also overlooked that your web server is bottlenecked by the leased line to 100 Mbits/sec anyway.


What to do? Use HTTP transport stream compression since text documents show a compression ratio of more than 1:3, many even 1:10, (every modern web server and most browsers support this). With this activated NanoPi now reads the text documents from disk and compresses it on the fly and based on a 1:3 compression ratio we can stream 300 Mbits/sec through our 100 Mbits/sec line. Initially accessing files is still slow as hell (lowest random IO performance possible by choosing USB HDD) but at least once the file has been read from disk it can saturate the leased line.


So relying on passive benchmarking we chose a combination of devices (NanoPi M3 + 500 GB HDD) that costs +100$ considering also shipping/taxes and is slow as hell for the use case in question.


If we stop relying on passive benchmarking, really look at our use case and switch on our brain we can not only save a lots of money but also improve performance by magnitudes. With an active benchmarking approach we identify the bottlenecks first:

  • Leased line with 100 Mbits/sec only: we need to use HTTP content-stream compression to overcome this limitation
  • Random access to many files: we need to take care of random IO more than sequential transfer speeds
  • We need to tune our network settings to make the most out of the sitiuation. Being able to use the most recent kernel version is important!
  • We're on a SBC and have to take care of CPU ressources: so we use a web server with minimum ressources and should find a way to avoid reading uncompressed contents from disk to immediately compress it on the fly since this wastes CPU ressources

So let's take an approach that would look horribly slow in the usual benchmarks but improves performance a lot: An Orange Pi One together with a Samsung EVO 64 GB as hardware, mainline kernel + btrfs + nginx + gzip_static configuration. Why and how does this work?

  • Orange Pi One has only Fast Ethernet and not GbE. Does this matter? Nope, since our leased line is limited to 100 Mbits/sec anyway
  • we know that the cheap EVO/EVO+ with 32/64 GB perform excellent when it's about random reads. At 4K we get 875 IOPS (3500 KB/s, see comparison of results), that's 8 times faster than using an external USB HDD
  • we use pre-compressed contents: that means a cron job compresses each and every of our static files and creates a compressed version with .gz suffix, if nginx communicates with browsers capable of that it delivers the already compressed contents directly (no CPU cylces wasted, if we configure nginx with sendfile option not even time in userspace wasted since the kernel shoves the file directly to the network interface!). Combine the sequential read limitation of SD cards on most boards (~23MB/s) with an 1:3 compression ratio and you end up at ~70MB/s with this trick. Twice as fast as uncompressed contents on an USB disk
  • unfortunately we would also need the uncompressed data on disk since some browsers (behind proxies) do not support content compression. How to deal with that? Using mainline kernel, btrfs and btrfs' own transparent file compression. So the 'uncompressed' files are also compressed but at a lower layer and while we now have each and every file twice on disk (SD card in fact) we only need 50 GB storage capacity for 100 GB original contents based on an 1:3 compression ratio. The increase in sequential read performance is still twice as fast since decompression happens on the fly.
  • Not directly related to the filesystem but by tweaking network settings for low latency and many concurrent connections we might be able to improve requests per seconds when many clients access in parallel also by factor 2 compared to an old smelly Android 3.x kernel we still have to use on many SBC (relationship with storage: If we do tune network settings this way we need storage with high IOPS even more)

An Orange Pi One together with an EVO 64GB costs a fraction of NanoPi M3 + USB HDD, consumes nearly nothing while being magnitudes faster for the 'static files web server' use case if set up correctly. While the usual moronic benchmarks testing CPU horsepower, GbE throughput and sequential speeds would show exactly the opposite.


And you get this reduction in costs and this increase in performance just by stopping to believe in all these 'benchmarking gone wrong' numbers spread everywhere and switching to active benchmarking: testing the stuff that really matters, checking how that correlates with reality (your use case and the average workload) and then setting up things the right way.


Final note: Of course an Orange Pi One is not the perfect web server due to low amount of DRAM. The best way to overcome slow storage is to avoid access to it. As soon as files are in Linux' filesystem cache the speed of the storage implementation doesn't matter any more.


So having our web server use case in mind: If we do further active benchmarking and identify a set of files that are accessed most frequently we could add another Orange Pi One and a Pine64+ with 2GB. The new OPi One acts as load balancer and SSL accelerator for the second OPi One, the Pine64+ does SSL encryption on his own and holds the most frequently accessed 1.7 GB in RAM ('grep -r foobar /var/www' at startup in the background -- please keep in mind that it's still +5 GB in reality if we're talking about a 1:3 compression ratio. Simply by switching on our brain we get 5GB contents cached in memory on a device that features only 2 GB physical RAM!). And the best: both new boards do not even need local storage since they can be FEL booted from our first OPi One.


Benchmark update using Hardkernel's 8GB eMMC module (the 'slowest' one):


  Reveal hidden contents



That's unbelievable +35000 IOPS at 1k! And 2300 MB/s sequential read speeds!!


Nope, that's just testing this eMMC in Hardkernel's SD card adapter on OS X using the most crappy SD card adapter available here. The write speeds are influenced by the SD card reader, the read speeds by OS X. Benchmarking gone wrong as usual.


Same benchmark tool (iozone) on the 'host' (ODROID-C2) that can take advantage of its eMMC (two runs):



  Reveal hidden contents



Strange results since with low record sizes performance is low while sequential write and especial read speeds exceed the numbers from before. As usual: Don't trust in these numbers. Evaluate them.


Based on the numbers above ODROID's eMMC modules seem to be not limited regarding random IO (always same as sequential) but seem to adopt a strategy where 'sequential vs random' is of no use any more. :)


As an exercise for the reader. The same eMMC module tested with fio:


  Reveal hidden contents





Since I've currently lying a few disks around I decided to start a storage benchmark with our most beefy board in this regard. You might think I'm talking about an octa-core SBC with lots of DRAM. Nope, quite the opposite: it's just a dual-core Cortex-A9 with 1 GB DRAM: Solid-Run's Clearfog based on ARMADA A388, a SoC specially designed for storage and networking applications. I tested with the more expensive Clearfog Pro but the results below should be valid for the cheaper Clearfog Base too (no internal GbE switch and one mPCIe/mSATA port less than the Pro)

The Clearfogs have one USB 3.0 port (not UASP capable, at least with current kernel), one M.2 slot and 1 or 2 MiniPCIe/mSATA slots. Both M.2 and mSATA slots can be turned into normal SATA 3.0 ports by using simple/cheap mechanical converters (like this or that). I tested also whether cheap JMB321 based SATA port multipliers work: they don't (might be fixable with kernel patches but to be honest using such a crappy PM with a Clearfog is no good idea anyway since these cheap PM are slow and prone to severe data corruption when overheating).
I tested all 5 disks connected to the M.2 slot via a mechanical SATA connector (connection established always with highest SATA version the drive supports) and the USB3 results are made with a JMS567 enclosure for the 3 SSD and the 2.5" notebook HDD and an ASMedia ASM1051E based enclosure for the 3.5" Seagate Barracuda.
The disks 
The Samsung SSDs are somewhat special since they implement 'TurboWrite': there's a smaller buffer on the SSD behaving like (expensive) MLC NAND but as soon as this buffer is full, write speeds especially on the smaller SSDs drop down to pretty low values (for details see here)
For the test I used these two iozone calls:
iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
iozone -a -g 4000m -s 4000m -i 0 -i 1 -r 4K -r 1024K

The first test simply uses a 100 MB test file size and iterates through 4K - 16M record size and tests also random IO. Since iozone lists random IO not as IOPS but also as KB/s I 'translated' the values as I did already before. The second iozone call simply reads/writes a 4GB file, one time with 4K record size, the other with 1M:

                  Random IO in IOPS           Sequential IO in MB/S
                 SATA           USB3           SATA            USB3
            4K read/write  4K read/write   1M read/write   1M read/write
  EVO 750     6995/9008      4898/6073       341 / 214       260 / 169
    PM851     6154/8621      4452/6240       339 / 133       254 / 134
  EVO 840    10148/19184     6207/8734       507 / 116       250 / 146 
Barracuda      288/4730       282/3730       172 / 182       152 / 171
 Momentus      133/3528       131/3810        38 / 40         38 / 40


  • Obviously SATA 3.0 and USB 3.0 performance of ARMADA 38x are pretty high: read speeds exceed 500 MB/s (and it's confirmed by Solid-Run engineers that Clearfog Pro can max out 3 x SATA 3.0 in parallel with fast SSDs :) )
  • The 2 HDD do not get bottlenecked that much when accessed via USB 3.0 (in fact for the slow notebook HDD nothing changes at all and the faster Barracuda is also not affected that much both regarding random and sequential IO)
  • The really fast SSDs are heavily affected when it's about random IO, the fast EVO 840 is only half as fast behind an USB-to-SATA bridge compared to SATA (this might improve if/when ARMADA 38x is able to support UASP)
  • Sequential reads with USB 3.0 get bottlenecked at around 250-260MB/s (same SSD in same enclosure on an UASP capable MacBook running OS X: +400 MB/s)
  • Sequential write performance seems to be weird (EVO 840 for example being faster when USB 3.0 is used?)
So why does write performance looks strange? Since my benchmark attempt sucks ;)
The tester knows that these SSDs implement TurboWrite therefore we need to test that individually to get the whole idea how these disks behave. It's also important to know that for all 3 SSD families the members with the small capacities perform worse when it's about sequential writes and even by choosing the slightly larger 250/256GB models write performance might be magnitudes faster (and gets then probably already bottlenecked by the host's SATA implementation!)
So let's look closer how the 3 SSD perform with 1GB, 2GB and 3GB test sizes (so we're not already exceeding TurboWrite buffer size or only at rewrite stage with EVO 750 which has the largest buffers):
              kB  reclen    write  rewrite    read    reread
EVO 750  1024000    1024   365707   367293   371606   374167
         2048000    1024   361948   222524   329422   358586
         3072000    1024   360469   141784   345507   361666

PM851    1024000    1024   151648   150172   350515   377275
         2048000    1024   142591   130159   337665   353065
         3072000    1024   139294   133203   347399   356333
EVO 840  1024000    1024   264257   423689   532733   545420
         2048000    1024   293179   418408   514973   517536
         3072000    1024   183027   239014   498971   514133

(so by testing with smaller amounts of data the sequential write performance is both higher and starts to vary a lot more between the 3 SSD. But still strange why EVO 840 gets close to SATA 3.0 maximum with sequential reads while the 2 other remain at ~360MB/s here since they're known to be able to also reach/exceed 500 MB/s)


All measurements here in detail:


  Reveal hidden contents
Crosscheck with Banana Pro
To see how the same disks perform with another device we support I used Banana Pro (you can use any other A20 based device, results will only differ slightly based on DRAM clockspeeds since IO performance with tablet SoCs depends on both clockspeed and slightly also memory bandwidth). I chose mainline kernel and the only difference compared to 'stock Armbian' was to use performance governor since currently we use 4.6 with vanilla images and the new schedutil cpufreq governor (showing superiour IO performance compared to ondemand now) will be available starting with 4.7:
                  Random IO in IOPS           Sequential IO in MB/S
                 SATA           USB2           SATA            USB2
            4K read/write  4K read/write   1M read/write   1M read/write
  EVO 750     2779/893       1277/619        106 / 39         34 / 35
    PM851     2562/859       1283/614        109 / 39         34 / 34
  EVO 840     3943/3478      1535/1526       122 / 37         34 / 34
Barracuda      259/728        237/600        104 / 38         34 / 34
 Momentus      126/716        120/598         38 / 38         33 / 35


  • A20's SATA implementation is only 2.0 and the SoC shows for whatever reasons a pretty limited write throughput (that's why all write tests perform identical, the ~38 MB/s are a simple host limit)
  • A20's sequential SATA read speed is also limited but seems to depend a bit on the device in question (see EVO840's better score here)
  • The random IO values are way lower but that's partially since at just 4K record size random IO gets tampered with interface limitations. Anyway if we look again at EVO 840 then it's 10148/19184 read/write IOPS with ARMADA 38x vs. just 3943/3478. This difference is huge especially with smaller record sizes the difference would even be more!
  • When having to rely on USB2.0 all disks perform identical if it's about sequential IO since USB2.0 is the bottleneck here. Also USB 2.0 lowers random IO values, the faster the disk, the more it gets slowed down (and with the ultra slow notebook HDD it makes almost no difference)

We also have to keep in mind how HDDs work: They use ZBR (zone bit recording) and store more data on the outer tracks so they're faster when less capacity is used. For the Momentus tested above that means that this disk is limited to 40 MB/s when empty. As soon as data will be stored sequential transfer speeds decrease and most disks show just half the speeds when written completely full. Also fragmentation becomes an issue on HDDs and will further slow things down. This just being said since so many people start whining when they hear A20 devices are only able to write with ~40MB/s. In case you use the disk (put data on it) that's fine for most 2.5" disks anyway since the SATA interface is not the bottleneck (and 99 percent of all disk benchmarks don't tell the truth unfortunately since always empty disks are measured).


Anyway: I wanted to do this test for a long time since it's quite interesting how different disk devices interact with the host's SATA implementation (on ARM boards, in x86 land nothing interesting happened the last decade since all SATA host implementations are fast enough). This is something one should keep in mind when stumbling accross 'SBC storage benchmarks' on the net. Different disks (and when USB is involved different USB-to-SATA bridges) massively influences storage performance attributed to the host.


Final words regarding the Clearfogs. They're known to show both excellent IO performance and network performance in parallel. Even if the SoC is just 'dual core ARMv7'. When comparing with results from above (quad- and octa-core Cortex-A53 tablet/OTT SoCs) it's pretty easy to get the idea what really matters in this area. It's definitely not CPU horsepower but the SoC being optimized for the use case  :)


Update: Tried it also with kernel 4.4.20 but still no UASP and way lower performance -- kernel 3.10.102 as used for the tests above should be preferred when it's about storage performance


  Reveal hidden contents



Here's an interesting benchmark from a low power storage solution for OPI ONE :

    Command line used: iozone -e -I -a -s 10m -r 4k -r 16k -r512k -i 0 -i 1 -i 2

              kB  reclen    write  rewrite    read    reread    read     write
           10240       4     3946     4287     9421     7887     5294     3984
           10240      16    11740    11651    21115    21087    14994     8719
           10240     512    24376    24454    35105    35191    32879    24729  

Beats ( by half a bit/s ) SDcards in performance, eMMC in price and any SSD or HDD in power consumption. Great for power optimized systems running on battery.


Sandisk ULTRA FIT USB3 Flash Drive. Tested with Armbian 5.16, rootfs on /dev/sda1 ( USB flash ). Max. power consumption ( incl. WiFi,flash) during testing was 550mA, board running from battery.

  On 9/12/2016 at 10:05 AM, rodolfo said:

Here's an interesting benchmark from a low power storage solution for OPI ONE



Thanks for the numbers. OPi ONE means H3 which means USB2.0 sequential transfer speeds bottlenecked by host implementation (~35MB/s with legacy, ~40MB/s with vanilla kernel if UASP can be used -- I haven't found any thumb drive so far supporting that). It should also be noted that benchmarks with just 10 MB file size might not tell the truth since many USB thumb drives are prone to throttling. They perform nice for 2-3 minutes and then drop down to laughable performance numbers (I've seen thumb drives slowing down from 80 MB/s to 2.x MB/s)


Anyway back to 'real storage' ;)


Since we support another family of SBCs with native SATA (i.MX6 quad core based boards) now let's have a look also at this platform. The Freescale/NXP i.MX6 features SATA 2.0, USB 2.0 and native GbE (the latter being limited to ~400 Mbits/sec due to internal bus limitations). When I started testing IO throughput on SBCs I went with a Cubietruck and a Wandboard Quad (with kernel 3.0.35). SATA throughput was limited to 100/90 MB/s back then.

Since we know that random IO matters a lot more for many use cases let's have a look how the Wandboard performs also in this regard and also using latest kernel (I used 4.7.3-armv7-x2 and followed Robert C Nelson's instructions here https://eewiki.net/display/linuxonarm/Wandboard  since we don't support the Wandboard yet in Armbian). So while I test on an unsupported board the results should be valid for all quad-core i.MX6 boards using vanilla kernel (only the quad-core i.MX6 has SATA!).
                  Random IO in IOPS           Sequential IO in MB/S
                 SATA           USB2           SATA            USB2
            4K read/write  4K read/write   1M read/write   1M read/write
  EVO 750     3138/2980      1594/1523       100 / 100        29 / 25
    PM851     3056/3149      1473/1417       100 / 100        29 / 25
  EVO 840     4141/5073      1664/1983       100 / 100        25 / 25
Barracuda      268/2330       245/1146       100 / 100        25 / 22
 Momentus      130/2109       126/1273        38 / 40         27 / 25
Sequential USB and SATA speeds were not tested individually since the host's implementation is the bottleneck here and it's just a waste of time. I only tested with a few devices and used the appropriate values for all other sequential numbers above.
Interpreting the results:
  • Sequential SATA speeds are limited by i.MX6, you won't exceed 100 MB/s read/write
  • USB2.0 performance with i.MX6 is rather low, if you attach disks here better choose SATA
  • For whatever reasons EVO840 in JMS567 enclosure shows 4MB/s less USB2.0 read performance (keep such differences in mind if someone presents you a collection of 'SBC storage results' when he used different disks and different enclosures and attributes the numbers he gets to the SBC in question forgetting about the influence disk and USB-to-SATA bridge have!)
  • i.MX6's USB controller seems to dislike ASM1051E USB-to-SATA bridge used to test with the 3.5" Seagate Barracuda. Sequential transfer speeds are pretty low (again: keep this in mind when stumbling accross 'SBC storage benchmarks' made with unknown equipment)
  • Random IO results with the SSDs are not that good as compared with Clearfog but look way better compared to A20. Main reason: random IO gets tampered with sequential interface limitation. So in case only 1K chunks and not 4K would've been used random IO results with the 3 different SATA implementations would not vary that much as with 4K now
The latter is something we (as distro people) should have in mind when choosing our defaults. If we deal with SSDs, mainline kernel and btrfs for example then due to both USB and SATA interface limitations choosing a smaller block size than the default 4K might result in way better performance when dealing with a lot of small files (to be confirmed with real world workloads since synthetic benchmarks tend to be misleading).
Now time to come to some conclusions regarding Armbian devices and disk usage: I do not differentiate between individual boards that much since all devices using the same SoC will perform more or less identical unless memory bandwidth differs a lot (eg. NanoPi NEO with just 408 MHz while all other H3 devices use 624 MHz DRAM clockspeed -- to be confirmed) or strange design decisions limit the board's IO capabilities (eg. Orange Pi Plus and Plus 2 having an ultra slow USB-to-SATA bridge on the board limiting sequential transfer speeds to just 15/30 MB/s while all 4 USB ports are behind an internal USB hub so every cheap H3 device exposing USB ports directly outperforms these 2 more expensive H3 devices).
Armada 38x (Clearfog Pro/Base currently): Best storage performance, able to max out multiple SATA 3.0 lines in parallel, high random IO values (depends on the disks used). USB3.0 somewhat limited but for HDDs no deal breaker. Using mechanical adapters to convert M.2 and mPCIe/mSATA slots into real SATA ports you can attach 2 or 3 (Clearfog Pro) high performance SATA disks, the USB3 port is ok for one slower SSD or up to 2 3.5"" or 3 2.5" disks (using an USB3 hub in between of course)
Allwinner A20 (see list of recommended devices): SATA throughput limited but ok-ish especially when used together with 2.5" disks. USB2.0 performance ok-ish, on most A20 boards 2 USB host and 1 OTG port are available (no shared bandwidth) so up to 4 disks are possible. With mainline kernel and approriate enclosures UASP can be used and then both random and sequential performance increases (~40 MB/s).
Allwinner A10: IO performance might be as good as with A20 but since A10 lacks GbE and is a single core SoC you might not be able to benefit from IO performance.
NXP/Freescale i.MX6 (Hummingboard, Cubox-i, Wandboard): On boards with quad-core CPU there's 1 SATA 2.0 port with limited sequential transfer speeds available, 1 USB OTG (useable as host port) and depending on the board 1 - 3 USB host ports. USB performance is pretty low and anyone thinking about NAS use cases should keep in mind that GbE is limited to ~400 Mbits/sec here (on some i.MX6 boards (m)PCIe is available which could be used to add a 2nd dedicated GbE NIC)
Allwinner H3/H5: No real SATA (the 'SATA port' on Orange Pi Plus and Plus 2 is an ultra slow USB-to-SATA bridge) but the SoC features 1 USB OTG and 3 USB host ports. It depends on the board in question how many of these ports are available and whether an internal USB hub is used or not. With mainline kernel UASP can be used so GbE equipped H3/H5 boards that expose their USB ports directly make a nice low-cost NAS (covered already in a separate post)
All other boards we currently support are rather uninteresting when it's about storage performance, the ODROID-Cs being the exceptions when you need really high random IO performance and can afford their expensive eMMC modules.
Some information regarding USB3.0 performance of ODROID-XU4 available here. Some USB3 information/numbers for Actions Semi's S500 here. I tried to run the whole set of tests with my Roseapple Pi dev sample too but it was necessary to increase input voltage to 5.4V to prevent boot loops with USB3 disk connected (that's what you get when you can only power the board through crappy Micro USB!) and every test crashed so I consider USB3 with Roseapple Pi and LeMaker's Guitar as being broken (LeMaker's USB3 implementation broken anyway since they chose a standards violating pin scheme for the USB3 receptacle).
Edit: Really impressive USB3 results with ODROID-XU4 in the meantime due to UASP enabled/fixed in HK's 4.9 kernel branch: http://xu4.keltike.de/performance/odroidxu4-with-and-without-uas-support/ 
All results collected with Wandboard Quad before as a reference:
  Reveal hidden contents
EVO 750 SATA                                                   random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     1674     1398    33091   139321   115147     1267
            1024       2     3827     2739    36232   215040   192264     2561
            1024       4    11813     9821    15210    16188    12720     7488
          102400       4    13527    14588    16472    16029    12554    11921
          102400      16    28064    27647    32029    31988    28320    25568
          102400     512    55824    56316    46212    46300    45289    55679
          102400    1024    61382    61426    52161    52336    51207    61328
          102400   16384    70819    70556    93527    93750    93882    70720

EVO 750 USB2                                                   random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1      866      746    14196   137470   119096      751
            1024       2     1978     1442    14393   215851   197607     1436
            1024       4     5661     5311     7178     7585     6402     5268
          102400       4     6260     6191     6895     6789     6377     6094
          102400      16    11941    12047    14203    14111    13958    11506
          102400     512    20059    19996    22195    22185    21971    20000
          102400    1024    20409    20381    22814    22873    22837    20350
          102400   16384    22705    22693    27661    27710    27685    22679
         4096000       4    25133    25053    29143    29247
         4096000    1024    25101    25044    29095    29766

PM851 SATA                                                     random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     1508     1260    31120   136081   117067     1264
            1024       2     3430     2545    24522   217777   194678     2876
            1024       4    10365     8721    13536    16064    16033     7020
          102400       4    13347    14398    16243    16404    12223    12598
          102400      16    27878    27524    32129    32262    28169    25248
          102400     512    56968    56551    46874    47186    46878    56191
          102400    1024    59024    58994    47177    47467    47338    58805
          102400   16384    68729    68342    91944    92299    92458    68484

PM851 USB2                                                     random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1      820      740    11443   137356   118489      799
            1024       2     1934     1459    12738   217590   195868     1520
            1024       4     4931     4942     6595     7475     7463     4885
          102400       4     6155     6104     7159     7004     5892     5667
          102400      16    11622    11628    14132    14222    13436    11507
          102400     512    19996    19969    22296    22353    22189    19961
          102400    1024    20436    20417    22949    23034    22996    20333
          102400   16384    22609    22560    27654    27729    27689    22514

EVO 840 SATA                                                   random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     3302     5266     6271     6618     6527     4957
            1024       2     7764     9267    11440    12428    12227     8748
            1024       4    12147    14361    18865    21864    21567    14000
          102400       4    16767    20489    21208    21239    16563    20291
          102400      16    44285    50872    44976    44989    39181    50656
          102400     512    96689    98347    73832    74044    73459    98207
          102400    1024    96753    98834    73374    73534    73313    99334
          102400   16384   111089   128889   109333   109760   109738   127815
         4096000       4    79955   134886   105653   106075
         4096000    1024   100780   131599   100493   101337

EVO 840 USB2                                                   random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     1702     2450     2551     2660     2657     2389
            1024       2     3593     4295     4842     5229     5222     4115
            1024       4     6023     6459     7245     7972     7959     6341
          102400       4     7266     7902     7980     7984     6658     7934
          102400      16    14178    15733    16016    16064    15996    15472
          102400     512    22935    23053    26182    26255    26136    23053
          102400    1024    23549    23671    27108    27131    27097    23716
          102400   16384    23796    24704    28483    28517    28511    24697
         4096000       4    25101    25219    25013    25113
         4096000    1024    25219    25248    24378    24491

Barracuda SATA                                                 random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     1159      974     4970   133767   115407      878
            1024       2     2858     2119    15525   207578   196621     1815
            1024       4     8194     7686    10299    14218    10713     7791
          102400       4    12908    12106    15717    15816     1073     9320
          102400      16    24002    26542    31438    31734     3988    19055
          102400     512    54652    54886    45274    45949    33188    54503
          102400    1024    57768    57216    45870    46525    38651    57300
          102400   16384    65332    65425    80825    82306    89452    64314

Barracuda USB2                                                 random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1      682      501     4419   137119   118642      596
            1024       2     1380     1117     4574   213735   195085     1151
            1024       4     3965     3724     4407     5553     5078     3780
          102400       4     5128     5086     5336     5318      980     4584
          102400      16    10770    10503    13145    12480     3447     9943
          102400     512    19085    19026    21011    21080    17964    18770
          102400    1024    19060    19117    21299    21513    19242    18787
          102400   16384    22926    22850    26155    25255    25265    22868
         4096000       4    22746    22640    25947    25679
         4096000    1024    22581    22393    25873    25787

Momentus SATA                                                  random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1     1010      777     6605   132165   115742      436
            1024       2     2132     1643     5997   206998   195975     1070
            1024       4     5334     3602     8201    10564     4152     4495
          102400       4     7927     9441    10999    11034      522     8438
          102400      16    21230    21626    25049    24639     1948    18737
          102400     512    38716    37962    33651    33601    18513    38903
          102400    1024    38719    37988    27804    27801    21447    38663
          102400   16384    31710    32190    33497    34030    34066    32311
         4096000       4    40193    38990    38438    38522
         4096000    1024    40231    39216    38583    38431

Momentus USB2                                                  random   random
              kB  reclen    write  rewrite    read    reread     read    write
            1024       1      623      583     4782   137969   118381      380
            1024       2     1435     1069     5062   214997   193608     1034
            1024       4     3961     3116     5184     6381     5036     3965
          102400       4     5159     5068     6354     6290      503     5091
          102400      16    10902    10750    14029    13199     1875    10394
          102400     512    19584    19481    21127    21117    17410    19641
          102400    1024    20012    19991    21244    21380    19366    20146
          102400   16384    22449    22363    26166    26183    26313    22191
         4096000       4    24968    24726    28279    28311
         4096000    1024    24970    24784    27559    27988

And a final look at how HDDs work and why you should care. As already said all modern HDDs use zone bit recording, that means they show higher sequential transfer speeds on the outer tracks compared to the inner. When a HDD is empty data is written to these outer tracks so measuring a totally empty HDD will show 'best case' performance. As soon as you start to fill the HDD with data (which is obviously the only reason you attach a HDD to a host) data will be written to the inner tracks and sequential transfer speeds will slow down. How exactly depends on the drive in question (and the use case the drive has been made for).

To demonstrate that I reverted back to the Clearfog Pro (SATA is no bottleneck there) and used the 2 HDD, created 10 partitions of nearly equal size and tested through the first, the last and one in the middle (emulating empty, full and half full capacity).
Seagate Momentus (2.5", 5400 rpm, 60 GB): Sequential transfer speeds start with ~39 MB/s when empty, drop down to 34 MB/s when half of the capacity is used and end up at just 22 MB/s when the disk is nearly full (I used just 2 GB test file size so the write speeds are tampered slightly with Linux fs buffers):
  Reveal hidden contents
            1024       1     1613     1336     7839   251656   209922      783
            1024       2     3269     2298     7271   407964   361717     1573
            1024       4     5657     7810    10225    15710     4060     6745
          102400       4    15766    14710    15567    15627      530    14105
          102400      16    32941    32630    37075    35280     2011    30115
          102400     512    38435    37894    32740    29356    21070    38251
          102400    1024    37622    38043    34885    34407    26804    38733
          102400   16384    38131    37420    38057    38469    37374    37716
         2048000       4    41877    41543    39181    39754
         2048000    1024    41887    41397    39121    39723
            1024       1     1436     1134     5072   251538   210994      448
            1024       2     2956     2045     8195   404429   361717     1270
            1024       4     7475     5233    10238    15856     7345     7161
          102400       4    15367    15266    15815    15904      535    13230
          102400      16    31096    30177    34725    35191     2011    28391
          102400     512    34110    33199    33635    33951    19654    33545
          102400    1024    33283    33657    31310    31021    24765    34202
          102400   16384    33772    33661    33714    34104    33161    33487
         2048000       4    37189    35951    33942    34314
         2048000    1024    37078    35895    33972    34355
            1024       1     1175     1018     3836   250613   209074      625
            1024       2     2223     2065     5413   413900   366096     1384
            1024       4     3924     6318     9634    16351     7138     4609
          102400       4    15642    15316    15673    15759      500    13776
          102400      16    21882    19901    23201    23448     1836    20109
          102400     512    23027    22473    23357    23566    15241    22685
          102400    1024    22827    22585    21969    21736    18059    22984
          102400   16384    22699    22379    22583    22653    22327    22665
         2048000       4    24608    23575    22234    22464                
         2048000    1024    24670    23518    22147    22446                

Seagate Barracuda (3.5", 7200 rpm, 3 TB): Sequential transfer speeds start with ~170 MB/s when empty, increase slightly to 180 MB/s when half of the capacity is used and end up at 120 MB/s when the disk is nearly full (I again used just 2 GB test file size so the write speeds are tampered slightly with Linux fs buffers):
  Reveal hidden contents
            1024       1     2459     2733    20758   450687   381781     2904
            1024       2     4923     3325    17927   406843   360806     4064
            1024       4    12340    12346    13771    29327    19628    10760
          102400       4    45184    36575    45931    39182     1138    17316
          102400      16    73604    81948   110793    76730     4236    53874
          102400     512   155736   141373   100301   103667    45085   118979
          102400    1024   153853   120599   135423   138861    62901   114639
          102400   16384   107160   105440   120605   134529   134225   105122
         2048000       4   183645   179076   171697   173854                
         2048000    1024   181433   182583   169158   173991                
            1024       1     2928     2510    20796   252099   209706     1600
            1024       2     4733     3242    13781   410616   363647     3390
            1024       4    11214    11214    11251    28084    19536    10067
          102400       4    44854    35828    42258    38080     1127    17715
          102400      16    91030    87158   109827   114544     4172    62722
          102400     512   109817   132187    99452    96045    45056   112181
          102400    1024   143043   104197   148451   135735    66263   106762
          102400   16384   126271   110233   127666   142910   151631   105234
         2048000       4   190118   187826   178616   180056                
         2048000    1024   191598   187306   179970   180271                
            1024       1     2858     2508    20625   443841   374132     2260
            1024       2     4551     3419    12341   411087   362481     3807
            1024       4    10244    10242    11210    27866    16311     9476
          102400       4    42817    33391    40694    37554     1119    16443
          102400      16    73148    77292   109766    81873     4209    54197
          102400     512    97568   107834    99448    99980    41625   106760
          102400    1024    96071   106004   112473   109551    58857   105308
          102400   16384    91855    90803    95528    98959   105164    88006
         2048000       4   129782   125175   118716   120473                
         2048000    1024   129244   126688   119378   120973           

So obviously the Barracuda uses a different ZBR strategy so let's test through all 10 partitions. This disk retains full performance over 2/3 the capacity and starts to slow down only when the last 1/3 of capacity will also be used:
  Reveal hidden contents
              kB  reclen    write  rewrite    read    reread
         2048000    1024   183365   181698   174750   174246
         2048000    1024   210040   209458   199876   200919
         2048000    1024   196397   209302   199451   201650
         2048000    1024   200381   198844   188294   192124
         2048000    1024   186946   186665   178561   180075
         2048000    1024   182293   179762   167753   172225
         2048000    1024   173535   168544   159783   160495
         2048000    1024   159402   158012   149518   150317
         2048000    1024   146046   140609   133340   134079
         2048000    1024   129375   127552   118809   121299

To be honest: I don't trust that much in these numbers since results with h2benchw (testing also through the whole capacity) show lower numbers and also that sequential performance degrades constantly over the whole capacity: http://www.tomshardware.com/reviews/4tb-3tb-hdd,3183-10.html
Anyway: Since all HDDs use ZBR and nobody will use HDDs that are empty all the time we must keep in mind how these disks behave. They get slower when used (data stored on it) and fragmentation adds to this problem when you really stuff as much as possible on them. An average 2.5" HDD that shows up in moronic benchmark tests with a +90 MB/s score performs differently in reality:
At half of the capacity we're already talking about just 70 MB/s and if the disk is used heavily and a lot of fragmentation happened then the average sequential access speed will already be at 50 MB/s or even below. That means that even any of our old A20 boards won't be a real bottleneck since with good settings the 1GB or 2GB DRAM will act as filesystem cache (so A20's SATA write bottleneck won't be that much of a problem) and SATA read speeds on A20 exceed 100 MB/s anyway.
Unfortunately A20 shows also unbalanced GbE performance (in the opposite direction as SATA) but there's still some hope that A20's quad-core successor (called R40) uses new IP blocks already known from H3 boards so maybe we will support soon another SoC family with native SATA implementation that performs absolutely ok when we're talking about 2.5" disks (and exclude stuff like WD's VelociRaptor ;) )
As a reference stuff used to test the disks:
  Reveal hidden contents
root@armada:~# cat /usr/local/test-slow-disk.sh
cd /mnt/empty
iozone -e -I -a -s 1M -r 1k -r 2k -r 4k -i 0 -i 1 -i 2 | grep "         1024"
iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2  | grep "         1024"
iozone -a -g 2000m -s 2000m -i 0 -i 1 -r 4K -r 1024K | grep "         2048"
cd /mnt/half-full
iozone -e -I -a -s 1M -r 1k -r 2k -r 4k -i 0 -i 1 -i 2 | grep "         1024"
iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2  | grep "         1024"
iozone -a -g 2000m -s 2000m -i 0 -i 1 -r 4K -r 1024K | grep "         204"
cd /mnt/full
iozone -e -I -a -s 1M -r 1k -r 2k -r 4k -i 0 -i 1 -i 2 | grep "         1024"
iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2  | grep "         1024"
iozone -a -g 2000m -s 2000m -i 0 -i 1 -r 4K -r 1024K | grep "         2048"

root@armada:~# cat /proc/partitions 
major minor  #blocks  name

 179        0    7761920 mmcblk0
 179        1    7605648 mmcblk0p1
   8        0   58605120 sda
   8        1     204800 sda1
   8        2    5860512 sda2
   8        3    5860512 sda3
   8        4    5860512 sda4
   8        5    5860512 sda5
   8        6    5860512 sda6
   8        7    5860512 sda7
   8        8    5860512 sda8
   8        9    5860512 sda9
   8       10    5860512 sda10
   8       11    4344952 sda11

root@armada:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.2G  1.2G  5.9G  17% /
devtmpfs        503M     0  503M   0% /dev
tmpfs           503M     0  503M   0% /dev/shm
tmpfs           503M  7.8M  496M   2% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           503M     0  503M   0% /sys/fs/cgroup
tmpfs           503M     0  503M   0% /tmp
tmpfs           101M     0  101M   0% /run/user/1000
/dev/sda6       5.6G  1.1M  5.1G   1% /mnt/half-full
/dev/sda2       5.6G  2.7M  5.1G   1% /mnt/empty
/dev/sda11      4.2G  608K  3.8G   1% /mnt/full

root@armada:~# for i in 2 3 4 5 6 7 8 9 10 11 ; do
> mkfs.btrfs -f /dev/sda${i}
> mkdir /mnt/sda${i}
> mount /dev/sda${i} /mnt/sda${i}
> cd /mnt/sda${i}
> iozone -a -g 2000m -s 2000m -i 0 -i 1 -r 1024K | grep "         2048"
> done

  On 9/12/2016 at 11:32 AM, tkaiser said:


It should also be noted that benchmarks with just 10 MB file size might not tell the truth since many USB thumb drives are prone to throttling. They perform nice for 2-3 minutes and then drop down to laughable performance numbers (I've seen thumb drives slowing down from 80 MB/s to 2.x MB/s)


Common USB2 thumb drives ( as most other USB2 peripherals ) are pretty useless, not so the mentioned USB 3 drive. The benchmark numbers were published after actually veryfying ( test loop running iostat for > 1 h ) results.



  On 9/12/2016 at 11:32 AM, tkaiser said:

Anyway back to 'real storage' ;)


Real storage for real use cases ? Some people might be interested in cheap fast low power storage for their OPI ONE/LITE/PC appliances running on batteries, solar powered devices or simple low volume wireless-NAS, private cloud ..... ;)

  On 9/12/2016 at 2:08 PM, rodolfo said:

Real storage for real use cases ? Some people might be interested in cheap fast low power storage for their OPI ONE/LITE/PC appliances running on batteries, solar powered devices or simple low volume wireless-NAS, private cloud ..... ;)


Hmm... thanks for mentioning this specific SanDisk USB thumb drive since both price and performance is way better than the "Samsung Memory Fit USB Flash Drive Speicherstick 128GB" I bought 2 months ago (shows really nice performance with USB3 on a MacBook but horribly low when used on a H3 device negotiating USB 2.0 -- no idea why but at least it's interesting/alarming that an USB3 device performs that bad when used on an USB2 port).


But I still fail to understand the 'use case' here since a cheap Samsung EVO with 32GB or 64GB outperforms your USB stick -- see below -- and on WiFi only or Fast Ethernet devices like those you mentioned sequential performance is irrelevant more or less, isn't it? I ordered 4 x 64GB EVO on friday for 52€ in total (shipping/VAT included) and seem not to be able to order the thumb drive you mentioned with same capacity for a lower price (but when looking at 128GB it gets interesting since the Samsung EVO SD cards with 128 are more expensive and show less performance).


Samsung EVO 64GB with ten times the file size tested:

          102400       4     3233     3339     7547     7557     7561     3392
          102400      16    11326    12256    14628    14618    14636    12237
          102400     512    21248    21333    22682    22684    22680    21402

Random IO is slightly better when considering read rates and lower sequential performance shouldn't matter since at least I fail to understand which use case could benefit from sequential performance exceeding 10MB/s on small H3 devices. If we would talk about the larger H3 models featuring GbE it would make more sense to have an eye on sequential storage performance (that's also the reason why I immediately stopped testing with the Roseapple Pi: 2 GB DRAM combined with one USB3 port but only Fast Ethernet is already a fail so even if Actions Semi's S500 would show decent USB3 storage performance... where's the use case with just Fast Ethernet?)


BTW: We replaced a few NAS boxes (40 TB each) with BananaPi M2+ a few months ago, each using an 64GB EVO and two 64GB USB thumb drives using btrfs' raid-0 mode and transparent file compression. Local access is as fast as before (even if they're still running on 4.6-rc1 using a rather old/slow version of montjoie's H3 Ethernet driver) but updating the contents is way faster (using btrfs send/receive feature to/from a rather distant location on the other side of the world).


In other words: I totally agree that using flash storage might be a great option for specific use cases, especially when used with SBCs where we can make use of mainline kernel and advanced filesystem/raid features. But the purpose of this thread was to fight all those moronic SBC storage benchmarks available on the net that only focus on sequential transfer speeds and totally forget about all the important stuff (random IO, host vs. device bottlenecks, performance depends not only on host but also on disk and with USB bridge chip used and so on)

  On 9/12/2016 at 4:09 PM, tkaiser said:


But I still fail to understand the 'use case' here since a cheap Samsung EVO with 32GB or 64GB outperforms your USB stick -- see below -- and on WiFi only or Fast Ethernet devices like those you mentioned sequential performance is irrelevant more or less, isn't it? I ordered 4 x 64GB EVO on friday for 52€ in total (shipping/VAT included) and seem not to be able to order the thumb drive you mentioned with same capacity for a lower price (but when looking at 128GB it gets interesting since the Samsung EVO SD cards with 128 are more expensive and show less performance).


I've generally found SDcards less reliable than their USB counterparts and the physical handling of USB3 sticks is usually much smoother with PCs and notebooks. This is just a personal preference. For OPI ONE/LITE I put boot stuff on an old SDcard of any class/size and the rootfs on a USB3 stick.


A fast flash disk does of course nicely complement a fast SDcard if you need to add low power storage. As I already mentioned, OPI ONE and OPI LITE both with USB flash and wifi run from simple dual-18650 battery pack. HDD and SSD will meet some serious limits there.


You are of course perfectly right in pointing out the speed nonsense promoted in benchmark infomercials. An SBC will always be a carefully balanced matched system of storage, computing and I/O. It just happens that the small OPI H3 boards running with stock legacy Armbian show very pleasing balanced performance.


Cubietruck+ (H8) and CT Raid Subboard

1x Samsung EVO 750 256GB SSD

1x Corsair Force 120GB SSD

Disks contain ~1GB of previous test data in Raid1 Mode (120GB array).

CT Raid Subboard is connected to CT+ via SATA connector. Subboard also offers USB3 data connection.

I would test with additional Raid Modes but this board requires soldering tiny surface mount resistors to change modes. Raid 1 is what I need for my application so I won't be able to test Raid0, JBOD, or PM modes for a while. I'd expect all the IO is terrible, thanks to the Cubietruck+.


Command line used: iozone -e -I -a -s 1M -r 1k -r 2k -r 4k -i 0 -i 1 -i 2

              KB  reclen   write rewrite    read    reread    read   write
            1024       1    1644    1997     7363     7732    7008    2004
            1024       2    2727    3359    12774    13928   12051    3345
            1024       4    4565    5268    23506    26897   20566    5257

Command line used: iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2

              KB  reclen   write rewrite    read    reread    read   write 
          102400       4    6156    6876    25990    26144   25722    6284                                                          
          102400      16   10489   11175    58778    58498   57430   10719                                                          
          102400     512   12853   13055    86507    87503   86036   12977                                                          
          102400    1024   12916   14186    78472    85506   85974   12914                                                          
          102400   16384   12949   14395    94904    99168   97182   13241    

Command line used: iozone -a -g 4000m -s 4000m -i 0 -i 1 -r 4K -r 1024K

              KB  reclen   write rewrite    read    reread
         4096000       4    5765    7425    30598    29413                                                                          
         4096000    1024

This last one took so long I had to terminate it. Got things to do ;)



Via usb2 on Windows10, I got writes of ~40MB/s (according to explorer). Will properly test with Windows10 + SATA3/USB3 later -- but I think the moral of the story is the CubieTruck+'s SATA->USB bridge is as bad as you heard.


Via USB3 on Windows10



Got the same results for 50MB, 1GB, and 4GB tests.

I feel like these 4k results may not be so great, but I don't have benchmarks without the RAID board to crosscheck. Maybe later.

  On 9/16/2016 at 4:29 AM, cmirra said:

CT Raid Subboard is connected to CT+ via SATA connector. Subboard also offers USB3 data connection.

I would test with additional Raid Modes but this board requires soldering tiny surface mount resistors to change modes.


LOL, really? Ok, this board is not interesting at all, no further test/numbers needed. I was interested in RAID-0 performance since due to the ultra slow GL830 USB-to-SATA bridge limiting real writes/reads to 15/30 MB/s max this mode would be absolutely useless (it's easy to understand but the average customer of this gadget most probably won't believe it until he sees numbers/graphs)


BTW: I would never use RAID-1 with a proprietary controller like this until tested how the controller deals with data mismatch between the two drives. Easy test: Create a text file containing "1", then attach both RAID members directly to a SATA port, when the FS is not acessible then it's already time to throw the gadget away. If it's accessible modify the file one time to read "2", on the other disk "3". Then test what you get back when the 2 disks are again RAID members. If it's either "2" or "3" then throw the gadget away. It's really that easy.


One final update regarding Roseapple Pi (using Actions Semi S500 just like LeMaker Guitar or the announced Cubieboard 6). Since I booted the board one last time anyway I thought let's give USB3 there also one last try. I connected a Samsung PM851 in an JMS567 enclosure (with own power supply!) to the USB3 port and had a look with most recent 3.10.105 kernel:

root@roseapple:~# lsusb
Bus 002 Device 002: ID 152d:3562 JMicron Technology Corp. / JMicron USA Technology Corp. 
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
root@roseapple:~# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M

That looks nice since UAS seems to be useable. Let's give it a try with the 2 iozone calls from Clearfog measurements above:

iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2

                                                    random    random
    kB  reclen    write  rewrite    read    reread    read     write
102400       4    13525    16451    19141    24275    14287    16492
102400      16    39343    48649    56409    63777    40203    45654
102400     512    68873    75835    89871   102977    98620    94677
102400    1024   115288   111747   170742   176837   172585   104936
102400   16384   117025   105977   195316   196457   196582   117819

iozone -a -g 4000m -s 4000m -i 0 -i 1 -r 4K -r 1024K

     kB  reclen    write  rewrite    read    reread
4096000       4   124421   132386   134795   134760
4096000    1024   127135   134943   127559   128026

If you compare with PM851 numbers made with Clearfog above it's obvious that S500 numbers are not that great. And since S500 features only Fast Ethernet at least for NAS use cases sequential transfer speeds are irrelevant anyway. I tried then to use an external VIA812 USB3 hub with integrated RTL8153 Gigabit Ethernet but this only led to error messages, /dev/sda1 disappearing and the board failing to boot afterwards. Fortunately this Roseapple Pi (formerly called Lemon Pi more correctly) has never been sold. There exist just a few dev/review samples that were sent out around the globe.


Maybe the above numbers help some future Cubieboard 6 owners who got tricked into believing CB6 would have 'real SATA' :)  Funnily USB-SATA on Cubieboard 6 will be much faster than on older Cubieboards (using A20's 'real SATA' or the horrible GL830 USB-to-SATA bridge on Cubieboard 5) but for most use cases this won't help much since there's only Fast Ethernet on the board. So even when adding a RTL8153 Gigabit Ethernet dongle to one of the 2 USB2 ports 'NAS performance' won't exceed that of Cubieboard 3 (the so called Cubietruck)

  On 9/12/2016 at 11:32 AM, tkaiser said:

Edit: Really impressive USB3 results with ODROID-XU4 in the meantime due to UASP enabled/fixed in HK's 4.9 kernel branch: http://xu4.keltike.de/performance/odroidxu4-with-and-without-uas-support/ 



XU4 easily outperformed by my new arrival ROCK64. Testing with an Samsung EVO840 (the same as above) with ext4, UAS enabled, a 4.4.70 kernel (ayufan build) and an external JMS567 disk enclosure:

iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2

                                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
          102400       4    19727    23938    25879    24231    18475    23886
          102400      16    67302    80160    87210    87008    68301    79484
          102400     512   278860   292958   292461   301834   292974   291585
          102400    1024   297825   310695   313499   324097   316276   313253
          102400   16384   325951   330928   340258   351640   352109   323207

That's sequential write performance of +325 MB/s and read even better at ~345 MB/s. So now let's wait and see how pricing will look like (I guess we get the 1GB variant for less than $25).


Debug log here (I simply used arm64 OMV rootfs from ODROID-C2): http://sprunge.us/HURC

Posted (edited)

Time for a small '2017 follow-up' on this issue this time only focussing on 'disk performance' (HDD/SSD, not onboard storage like eMMC or SD cards).


Since we all know that we don't need to differentiate here between individual boards but can look at board families (the SoC is the important thing) I only check the following 'families':

  • cheap Allwinner stuff running with A64 or H5 (that's the Pine64 numbers below, H3 boards score a bit lower)
  • Allwinner A20, R40 or V40 (that's the Banana Pi Pro numbers below, maybe random IO numbers look better with the R40/V40 boards but since those are only available from manufacturers also known as brain damaged retards that's not an option today)
  • i.MX6 (Wandboard Quad, not supported by Armbian but a few other boards using the same SoC)
  • Exynos 5422 (ODROID XU3 or XU4)
  • RK3328 (ROCK64)
  • Armada 38x (Clearfog Pro, same numbers will apply to Turris Omnia, Clearfog Base or Helios4 soon)

Why are all other boards missing? Since uninteresting if it's about storage. SATA or USB3 are a must, the only exception below are the UAS capable USB2 Allwinner devices since they show a few more MB/s sequential throughput and way better random IO numbers compared to USB2 solutions that only support the old/anachronistic USB Mass Storage mode.

All tests done with Samsung EVO SSDs (my EVO840 used for all tests except of the ODROID-XU4 results which were made with a much faster EVO850 instead and Pine64 numbers with a slower EVO750 so random IO numbers should be multiplied with 1.3 or even 1.4):

                      Random IO in IOPS     Sequential IO in MB/sec
                        4K read/write           1M read/write
Pine64 (USB2/UAS)         2260/1948                42 /  41
Banana Pi Pro (SATA)      3943/3478               122 /  37
Wandboard Quad (SATA)     4141/5073               100 / 100
ODROID-XU4 (USB3/UAS)     4637/5126               262 / 282
ROCK64 (USB3/UAS)         4619/5972               311 / 297
Clearfog Pro (SATA)      10148/19184              507 / 448

Testing methodology is somewhat wrong but I refuse to waste much of my time to do proper benchmarking since it's only about getting the idea what to expect. So as a summary:

  • USB2 SBC that lack UAS capabilities are simply too slow to be even listed here
  • UAS capable cheap Allwinner USB2 boards show ok-ish performance for low-end setups (think of NanoPi NEO2 as NAS for example)
  • SBC featuring 'real SATA' have to be differentiated. Armada 38x shows top performance like x86/x64 boards, i.MX6 is somewhat ok, A20/R40/V40 simply suck (you better don't buy this crap any more)
  • USB3 SBC like ODROID-XU4/HC1 or ROCK64 show way better performance. You only have to take care of the USB downsides (XU4 with an internal USB hub and receptacles problems is in a bad position here) and ensure that you're using good USB-to-SATA bridges to connect SSD or HDD

If we also take price/performance ratio into account then ROCK64 or other RK3328 boards that will follow are really hard to beat. The 1GB ROCK64 variant will most probably stay below the $25 margin and you should keep in mind that more DRAM is pretty useless if you think about (NAS) performance. Only on devices where IO throughput is way lower than network throughput having more DRAM as needed might improve NAS write performance if settings are appropriate (since then amounts of data that fit into RAM will be written at network speed and flushed later to disk)

Edited by tkaiser
Updated XU4/HC1 performance numbers since tested myself: https://forum.odroid.com/viewtopic.php?f=146&t=27548&start=50#p201175

@tkaiserMan these numbers are absolutely awesome! Thanks for this first test to get an idea of this little piece of gold. I'm really looking forward to it to replace my current banana Pi (gigabit LAN and SATA) as my Seafile Cloud Server. Your thoughts about RAM sound sane. I guess 2GB will be enough for seafile and perhaps NAS altogether.


What do you think about the capability to install armhf (instead of aarch64) software? Currently the Seafile server and some other stuff is mostly rpi2/3 compatible, which means it is armhf. I don't see any downside so far compared to aarch64. Ram usage is also lower with 32bit. I guess armhf stuff will be working on ROCK64 as well?


In terms of USB3: As usb3 is specified to have a quite high capability of power supply... how would that be taken account by the ROCK64 board design? Any idea what the board might lift? I thought about bying a simple USB3toSATA bridge with UASP support and power a Crucial MX300 256GB without an external power supply (just leaving it away). Such an SSD should not consume too much,  so I hope that the Rock64 USB3 port could handle it directly.


Does anybody have something to say about the points I mentioned?



On Topic: Somehow I totally feel you developers. You want to restrict development to worthy boards, which would also increase the quality of development for those few boards, because work is more focused. On the other hand I see boards like my old banana Pi and I really think that it would've never become what it is and as usable as it is without all your effort (which certainly started without much discussion before). From a user perspective it is a very hard topic that you discuss here. Obviously everybody in the community wants as much as possible. Unfortunately new builds/initial support matures after quite some time and then comes the breakthrough (which - sometimes - cannot be anticipated). Somehow it would also be a shame if nobody would even try to establish some initial support, thus nobody would know what the board is capable of :/. I really don't want to be in your situation to decide how this can be handled in the future. I'm just very thankful and happy that ARMBian was founded some years ago and that it became such a name in the ARM business :)


About the ROCK64:

One thing is clear: In my optinion the ROCK64 should be supported, as it combines so many good things. The interest is there, quite a few developers are working on it already, apparently. The more interesting is that two totally independent interest groups (ARMBian=Server/Cli aim and then the LibreELEC=Multimedia) show real interest and have put some effort already into it. This can push it very far and both camps can benefit one another. Really really great to see this!

  • Igor unpinned this topic

Early 2018 update


Time for another small update. It's 2018 now and since it seems Armbian will support a couple of RK3399 devices later this year let's have a closer look at the storage performance of them.


RK3399 provides 2 individual USB3 ports which seem to share a bandwidth limitation (you get with a fast SSD close to 400MB/s on each USB3 port but with two SSDs accessed in parallel total bandwidth won't exceed 400MB/s). RK3399 also features a PCIe 2.1 implemenation with 4 lanes that should operate at Gen2 link speeds (that's 5GT/s so in total we would talk about 20GT/s if the SoC is able to cope with this high bandwidth). Rockchip changed their latest RK3399 TRM (technical reference manual) last year and downgraded specs from Gen2 to Gen1 (2.5GT/s or 10GT/s with 4 lanes). So there was some doubt whether there's an internal overall bandwidth limitation or something like that (see speculation and links here).


Fortunately a Theobroma engineer did a test recently using Theobroma System's RK3399-Q7 with a pretty fast Samsung EVO 960 NVMe SSD: https://irclog.whitequark.org/linux-rockchip/2018-03-14 -- it seems RK3399 is able to deal with storage access at up to 1.6GB/s (yes, GB/s and not MB/s). This is not only important if you want ultra fast NVMe storage (directly attached via PCIe and using a way more modern and efficient protocol like ancient AHCI used by SATA) but also if RK3399 device vendors want to put PCIe attached SATA controllers on their boards. ODROID guys chose to go with an ASM1061 (single lane) on their upcoming N1 since they feared switching to a x2 (dual lane) chip would only increase costs while providing no benefits. But Theobroma's test results are an indication that even x4 attached controllers using all PCIe lanes could make reasonable use of the full PCIe bandwidth of 20GT/s.


Below we'll now have a look at USB3/UAS performance and PCIe attached SATA using ASM1061 (both done on an ODROID N1 developer sample some weeks ago). Those tests still use my usual EVO840 SATA SSD so results are comparable. You see two ASM1061 numbers since one is made with active PCIe link state powermanagement and the other without (more or less only affecting access patterns with small block sizes).


Then of course beeble's NVMe SSD tests are listed (here fio and there iozone -- numbers should also be valid for the other RK3399 devices where you can access all 4 PCIe lanes via M.2 key M or a normal PCIe slot: Rock960, NanoPC-T4 or RockPro64 (M.2 adapter needed then of course -- ayufan tested and got even better iozone numbers than beeble). And maybe later I'll add SATA and USB3 results from EspressoBin with latest bootloader/kernel.


(for an explanation which boards represent which SoC and why please see my last post above)

                      Random IO in IOPS     Sequential IO in MB/sec
                        4K read/write           1M read/write
RPi 2 under-volted        2033/2009                29 /  29
RPi 2                     2525/2667                30 /  30
Pine64 (USB2/UAS)         2836/2913                42 /  41
Banana Pi Pro (SATA)      3943/3478               122 /  37
Wandboard Quad (SATA)     4141/5073               100 / 100
ODROID-XU4 (USB3/UAS)     4637/5126               262 / 282
ROCK64 (USB3/UAS)         4619/5972               311 / 297
EspressoBin (SATA)        8493/16202              361 / 402
Clearfog Pro (SATA)      10148/19184              507 / 448
RK3399 (USB3/UAS)         5994/6308               330 / 340
ASM1061 powersave         6010/7900               320 / 325 
ASM1061 performance       9820/16230              330 / 330
RK3399-Q7 (NVMe)         11640/36900             1070 / 1150

As we can see RK3399 USB3 performance slightly improved compared to RK3328 (Rock64). It should also be obvious that 'USB SATA' as in this case using USB3/SuperSpeed combined with a great UAS capable USB-to-SATA bridge (JMicron JMS567 or JMS578, ASMedia ASM1153 or ASM1351) is not really that worse compared to either PCIe attached SATA or 'native SATA'. If it's about sequential performance then USB3 even outperforms PCIe attached SATA slightly. The 2 USB3 ports RK3399 provides when combined with great UAS capable bridges are really worth a look to attach storage to.


NVMe obviously outperforms all SATA variants. And while combining an ultra fast and somewhat expensive NVMe SSD with a dev board is usually nothing that happens in the wild at least it's important to know how the limitations look like. As we've seen from the RK3399-Q7 tests with fio and larger blocksizes we get close to 1600 MB/s at the application layer which is kinda impressive for devices of this type. Another interesting thing is how NVMe helps with keeping performance up: This is /proc/interrupts after an iozone run bound to the 2 big cores (taskset -c 4-5): https://gist.github.com/kgoger/768e987eca09fdb9c02a85819e704122 -- the IRQ processing happens on the same cores automagically, no more IRQ affinity issues with all interrupts ending up on cpu0 :) 


Edit 1: Replaced Pine64 numbers made with EVO750 from last year with fresh ones done with a more recent mainline kernel and my usual EVO840


Edit 2: Added Rasperry Pi 2 results from here.


Edit 3: Added EspressoBin numbers from here.


  On 4/29/2018 at 2:58 PM, fossxplorer said:

Amazing results of RK3399. Thx for sharing such details with us. I'd like to order RockPro64 when it comes available again.


My use case is NAS, and i think this could be a good board for it. Pine only sell a PCIe-SATA adapter with 2 SATA ports, but an adapter with more SATA ports would be nice since it can handle 1.6GB/s. So for my spinning disks, i could use up to 8 disks obviously. Hope we will be able to compile ZFS on this, as i wasn't on another RK3328 device.


EDIT1: Also i could imagine using this board as my desktop, but wonder if we have the necessary driver etc for GPU? That would also be awesome to have such a low power desktop.




I posted my ZFS solution here- able to compile/use the standard zfs-dkms packages with some modifications:



  On 4/29/2018 at 2:58 PM, fossxplorer said:

RK3399 ... more SATA ports ... can handle 1.6GB/s



Nope, please look at the details and how storage works. The 1.6GB/s above are just a confirmation that RK3399 has no crippled internal bandwidth (Hardkernel assumed such thing and designed their ODROID N1 around that assumption RK3399's PCIe implementation would be bottlenecked to 400 MB/s or something like that).


These 1.6GB/s are just the proof that an x4 Gen2 connection can be established with RK3399 (4 x 5 GT/s with 8b10b encoding) and that a NVMe (!!!) SSD shows nice high bandwidth numbers. NVMe is not a protocol from stone age (like SATA/AHCI), it's made for modern CPUs with multiple cores.


If you attach an old SATA or SAS controller to the PCIe bus you will run into IRQ affinity issues for sure and you have to keep in mind that you're now using the PCIe bus to talk to an own storage controller that implements protocols from last decade/century made for single CPU core systems with spinning rust connected (that's what SCSI, PATA, SATA, AHCI is all about). Lots of overhead and inefficiency involved now and when you then also try to combine a bunch of disks in special ways (e.g. RAID which is in my opinion a really stupid setup at home) performance will again drop drastically.


RK3399 is a general purpose ARM SoC and not a NAS SoC. Those exist though -- look at Marvell Armada 7k/8k for example. These SoCs differentiate between AP (application processor, that's the part with ARM cores) and CP (communcation processor(s)) and all the relevant work is offloaded on the CP. That's why they're fast as NAS.


BTW: RK3399 and PCIe GPUs won't work but please no off-topic discussions here. Let's focus on storage performance on SBC and try to avoid flooding this thread with babbling. So many 'educational' threads here meant as a tutorial to share knowledge have already been destroyed by babbling.


I thought it would be interesting to also have some numbers from a SSHD, more specifically a Firecuda 2.5" 2TB@5400rpm. The board is a HC1 with the latest firmare update from JMicron, and kernel 4.9.61.


I ran two tests, one with a BTRFS partition, and another with an EXT4 partition. Both using the whole disk, and only about 5% full. These are the results:


                                                     random    random 
     kB  reclen    write  rewrite    read    reread    read     write 
 102400       4    11433    13426    18452    18513     1653     5039
 102400      16    35302    41951    56583    57930     6069    36474
 102400     512    95472   102089   114966   116459    46364   103092
 102400    1024   102495   102538   115515   116898    54691    89851
 102400   16384    78096    81337   114334   134513   110433    76181


                                                       random   random    
     kB  reclen    write  rewrite    read    reread    read     write    
 102400       4    12427    13485    19307    19430     1709     1887
 102400      16    38360    55768    70089    70510     6280    16125
 102400     512   106688   109609   122658   124822    47793    93470
 102400    1024   108331   109693   123302   124806    64158    98570
 102400   16384    80573    87536   120168   104656   121852    91237

     kB  reclen    write  rewrite    read    reread    
4096000       4   122603   123460   131950   131997
4096000    1024   118265   118122   127649   127780

(I didn't do the 4Gb file test with BTRFS).


I think it is interesting to see the big difference in IOPS at small write sizes between the two filesystems. It seems like, for some reason, BTRFS is using the SSD cache more efficiently than EXT4.  @tkaiser any insight on this?

  On 5/9/2018 at 12:19 PM, JMCC said:

I think it is interesting to see the big difference in IOPS at small write sizes between the two filesystems. It seems like, for some reason, BTRFS is using the SSD cache more efficiently than EXT4.



Nope since the same happens when testing with a normal HDD or SSD as well (see numbers and comment here). With btrfs writes and iozone the -I switch (direct IO) has no meaning and this is testing filesystem buffers in reality. That's why the numbers are higher. You simply can't use iozone -I to test btrfs writes but you would need to adjust the test filesize to 3 or 4 times the amount of DRAM to eliminate filesystem buffers.  Just another example that you always have to benchmark the benchmark first to be able to generate insights from it.


Two more notes:

  • Almost all btrfs code lives inside the kernel so different kernels --> different behaviour and performance possible
  • Doing such tests without switching first to performance cpufreq governor always only produces numbers without meaning since cpufreq governor behaviour especially with small accesses can become a significant factor (different fs might result in different cpufreq scaling behaviour so while this is interesting too it doesn't tell about 'filesystem performance')


Well, if that's the case, then looking at the rest of the numbers (with bigger filesizes) it seems like EXT4 performance is still better than BTRFS, at least in current Armbian 4.9.y kernel and with a real-life scenario of ondemand governor. Do you know if it is the opposite (BTRFS performing better than EXT4) with other kernels?

Posted (edited)
  On 5/10/2018 at 5:52 AM, tkaiser said:

Just another example that you always have to benchmark the benchmark first to be able to generate insights from it.


wouldn't it make sense to write a small bash script which takes care about 'sane' settings generate a 'report' together with boardname, filesystem used,  kernel running etc.?

I remember you explained it once to me, seems you did it (at least) once more on CNX and now here again. I think in the long term you save some time to see quickly if the posted 'benchmarking' was done with a appropriate board/setting or just a 'collection of garbage'.. :) 



but please no off-topic discussions here


As soon as you decided to write or not write such a script, I'll delete this post to keep the thread clean.

Edited by chwe
  On 5/10/2018 at 11:18 AM, JMCC said:

Do you know if it is the opposite (BTRFS performing better than EXT4) with other kernels?



More or less irrelevant since btrfs is a modern filesystem utilizing modern concepts. E.g. 'checksumming' to ensure data integrity. This does not only 'waste' CPU cycles but results also in higher storage activity for the same tasks (since checksums have to be calculated, written, read, verified).


E.g. Copy-on-write (CoW) which directly affects performance of write patterns that are of the same size or less than the filesystem's block size (usually 4K or larger) since now every write is in reality a 'read, modify, write' cycle since already existing data needs to be read from disk, then the new stuff will be added and then the modified block will be written to a new location and only afterwards the old reference deleted. That's why btrfs and other CoW filesystems in all benchmarks writing small chunks of data shows horribly low performance. Same when running database benchmarks on btrfs with defaults --> horrible performance as expected since a CoW filesystem is nothing you want to put database storage on (and if you disable CoW in btrfs then also checksumming is gone and then you're better off using ext4 or XFS anyway)


The (in)famous clickbait site dedicated to provide only numbers without meaning (Phoronix) periodically 'benchmarks' various filesystems inappropriately so if you're after numbers visit https://www.phoronix.com/scan.php?page=article&item=linux414-fs-compare and similar posts.


  On 5/10/2018 at 1:08 PM, chwe said:

wouldn't it make sense to write a small bash script which takes care about 'sane' settings generate a 'report' together with boardname, filesystem used,  kernel running etc.?



Unfortunately not since passive benchmarking ('fire and forget' mode) never works. It only provides numbers without meaning and nice looking graphs but zero insights. With storage benchmarks as with every other benchmark only 'Active benchmarking' works. And that requires some understanding, a lot of time and the will to throw away most of your work (+95% of all benchmark results since usually something goes wrong, you have to find out what and then repeat). Almost nobody does this.


On every benchmarked host but especially on SBC with their weak ARM CPU cores and limited resources you always need to monitor various resources in parallel (htop, 'iostat 5', 'vmstat 5' and so on), switch at least to performance governor, take care about process and IRQ affinity (watching htop) especially on those boards with big.LITTLE implementations since otherwise you end up with the usual passive benchmarking result: Casual benchmarking: you benchmark A, but actually measure B, and conclude you've measured C.


Next problem: if you generated numbers it's about to generate insights from these numbers. Recently someone showed me this link as a proof that bcache in Linux would be a great way to accelerate HDD access by using SSDs: http://www.accelcloud.com/2012/04/18/linux-flashcache-and-bcache-performance-testing/


True or not? What do the numbers tell? While almost everyone looking at those numbers and graphs will agree that bcache is great in reality it's exactly the opposite what these benchmark numbers show. But people ignore this since they prefer data over information and ignore what the numbers really tell them.


Back on topic (SBC storage and not filesystem performance): only reasonable way to compare different boards is to use same filesystem (ext4 since most robust and not that prone to show different performance depending on kernel version) and performance governor, eliminating all background tasks that could negatively impact performance and having at least an eye on htop. If you see there one CPU core being utilized at 100% you know that you run in a CPU bottleneck and have to take this into account (either by accepting/believing that a certain SBC is simply too weak to deliver good storage performance since CPU bottlenecked or by starting to improve settings as it's Armbian's or my 'Active Benchmarking' approach with benchmarks then ending up with optimized settings --> there's a reason 'our' images perform on identical hardware sometimes even twice as fast as other SBC distros that don't care about what's relevant)

This topic is now closed to further replies.
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines