2 2
tkaiser

Swap on SBC

Recommended Posts

This is some more research based on prior efforts.

 

The goal is to make more efficient use of available RAM. If the system runs low on memory only two options are possible: either the kernel invokes the oom-killer to quit tasks to free memory (oom --> out of memory) or starting to swap.

 

Swap is a problem if it happens on slow media. 'Slow' media usually describes the situation on SBC. 'Average' SD cards (not A1 rated) are slow as hell when it's about random IO performance. So swapping is usually something that should be avoided. But... technology improves over time.

 

In Linux we're able to swap not only to physical storage but since a few years also to compressed memory. If you want to get the details simply do a web search for zram or check Wikipedia first..

 

Test setup is a NanoPC-T4 equipped with 4 GB RAM (RK3399 based so a big.LITTLE design with 2xA72 and 4xA53). I crippled the board down to being a quad-core A53 running at 800 MHz where I can easily switch between 4GB RAM and lower numbers: Adding 'mem=1110M maxcpus=4' to kernel cmdline results in the A72 cores being inactive, the kernel only using 1 GB DRAM and for whatever reasons cpufreq scaling not working so the RK3399 statically being clocked at 808 MHz. All tests done with RK's 4.4 (4.4.152).

 

This test setup is meant as 'worst case possible'. A quad-core A53 at 800 MHz is more or less equivalent to a quad-core A7 running at ~1000-1100 MHz. So we're trying to test with the lower limit.

 

I used a compile job that requires up to 2.6 GB RAM to be built (based on this blog post). The task is to build ARM's Compute Library which involves swapping on systems with less than 3 GB memory. Let's have a look:

 

In the following I tried a couple of different scenarios: Swap on physical media and also two different zram algorithms:

 

  • w/o: no swapping happened since board booted with full 4GB RAM active
  • nvme: Transcend TS128GMTE110S SSD in M.2 slot, link is established at x4 Gen2
  • emmc: the 16GB ultra fast Samsung eMMC 5.1 on NanoPC-T4
  • usb2: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB2 port (UAS works)
  • usb3: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB3 port (UAS works)
  • hdd: Samsung HM500JI 2.5" HDD in JMS567 disk enclosure, attached to USB2 port (UAS works)
  • sd card: 'average' SanDisk 8 GB SD card (not A1 rated so horribly low random IO performance)
  • lzo: zram with lzo as compression algorithm
  • lz4: zram with lz4 as compression algorithm
     

And the numbers are:

          w/o    nvme     lzo     lz4    emmc    usb2    usb3     hdd    sd card
real	100m39  118m47  125m26  127m46  133m34  146m49  154m51  481m19   1151m21
user    389m48  415m38  405m39  402m52  415m38  415m29  407m18  346m28    342m49
sys      11m05   29m37   36m14   60m01   34m35   66m59   65m44   23m05    216m25

You need to look at the 1st row: that's the time the whole job took. For more details consult the 'time' manual page.

 

In other words: When limiting the RK3399 on NanoPC-T4 to just the four A53 cores running at 800 MHz the compile job takes 100 minutes with 4 GB RAM. As soon as we limit the available RAM to 1 GB swapping has to occur so it gets interesting how efficient the various approaches are:

 

  • NVMe SSD is the fastest option. Performance drop only 18%. That's due to NVMe being a modern storage protocol suited for modern (multi-core) CPUs. Problem: there's no PCIe and therefore no NVMe on the majority of SBC
  • Zram with both lzo and lz4 algorithms performs more or less the same (interestingly lzo slightly faster)
  • Slightly slower: the fast Samsung eMMC 5.1
  • Surprisingly the EVO840 SSD connected via USB2 performs better than connected via USB3 (some thoughts on this)
  • Using a HDD for swap is BS (and was BS already the last 4 decades but we had no alternative until SSDs appeared). The compile job needs almost 5 times longer to complete since all HDD suck at random IO
  • Using an average SD card for swap is just horrible. The job that finished within 100 minutes with 4 GB DRAM available took over 19 HOURS with swap on an average SD card (please note that today usual A1 rated SD cards are magnitudes faster and easily outperform HDDs)

 

Summarizing: NVMe SSDs are no general option (since only available on some RK3399 boards). Swap on HDD or SD card is insane. Swap on USB connected SSDs performs ok-ish (~1.5 times slower) so the best option is to use compressed DRAM. We get a performance drop of just 25% at no additional cost. That's amazing.

 

The above numbers were 'worst case'. That's why I crippled the RK3399 to a slow performing quad-core A53. You get the idea how 'worse' zram might be on the slowest SBCs Armbian runs on (I know that there are still the boring Allwinner A20 boards around -- yep, they're too slow for this).

 

When I did all this boring test stuff I always recorded the environment using 'iostat 1800' (reports every 30 minutes what really happened and shows in detail how much data has been transferred and on which the CPU cores spent time). Quite interesting to compare %user, %sys and especially %iowait percentages:

 

Without swap:

real	100m39.355s
user	389m48.308s
sys	11m5.366s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          97.14    0.00    2.81    0.00    0.00    0.05
          98.11    0.00    1.89    0.00    0.00    0.00
          96.49    0.00    3.51    0.00    0.00    0.00
          33.63    0.00    1.17    0.00    0.00   65.20

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           0.39         6.19        15.34      11136      27604
mmcblk1           0.12         0.01         7.08         24      12748
mmcblk1           0.29         0.04        29.12         76      52408
mmcblk1           0.15         0.16        17.47        280      31444







128 GB NVMe SSD (Transcend TS128GMTE110S)

real	118m47.028s
user	415m38.041s
sys	29m37.947s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          90.17    0.00    9.83    0.00    0.00    0.00
          89.09    0.00   10.65    0.24    0.00    0.02
          89.06    0.00   10.87    0.06    0.00    0.01
          79.83    0.00   13.75    0.37    0.00    6.05

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
nvme0n1        3531.51      4049.97     10076.08    7290020   18137140
nvme0n1        4389.70      6759.53     10799.27   12167156   19438688
nvme0n1        4196.89      7548.11      9239.46   13586596   16631036
nvme0n1        5397.18      7772.77     13815.96   13990984   24868736

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           8.60       494.36        15.36     889860      27656
mmcblk1           5.43       300.62         5.71     541120      10276
mmcblk1           7.35       332.49        29.08     598480      52336
mmcblk1          11.18       587.00        20.49    1056608      36876






Samsung eMMC 5.1

real	133m34.405s
user	415m38.955s
sys	34m35.487s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          86.06    0.00    9.33    4.02    0.00    0.59
          82.91    0.00   11.93    4.61    0.00    0.54
          78.06    0.00   13.79    7.60    0.00    0.55
          79.67    0.00   12.85    6.67    0.00    0.81
          23.34    0.00    4.78    5.33    0.00   66.55

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1         885.85      3661.18      7399.21    6590168   13318660
mmcblk1        1525.72      5780.62      7345.17   10405124   13221300
mmcblk1        2074.55      7532.86      6865.80   13559216   12358516
mmcblk1        1465.59      5757.48      7218.02   10363516   12992516
mmcblk1         768.81      2683.68      3888.00    4830624    6998408





Samsung EVO840 USB3 (Class=Mass Storage, Driver=uas, 5000M)

real	154m51.541s
user	407m18.963s
sys	65m44.394s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          83.58    0.00   12.39    3.89    0.00    0.14
          75.49    0.00   16.36    7.95    0.00    0.20
          54.74    0.00   19.54   25.07    0.00    0.65
          74.57    0.00   16.33    8.83    0.00    0.27
          51.38    0.00   22.86   24.88    0.00    0.88
           5.81    0.00    1.01    0.57    0.00   92.60
          
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             899.99      3277.29      7391.12    5899448   13304748
sda            1771.87      4602.59      5641.83    8284336   10154892
sda            2859.21      7020.32      7443.85   12636572   13398932
sda            1627.26      4463.28      4558.92    8033988    8206140
sda            2986.07      8075.17     10204.91   14535304   18368832
sda             114.34       433.28       234.57     779896     422224

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           8.76       449.59        14.97     809300      26944
mmcblk1           6.06       225.07         4.76     405108       8564
mmcblk1           6.22       212.92         4.06     383248       7304
mmcblk1           7.90       290.09        28.89     522164      52008
mmcblk1           8.65       373.69         4.33     672648       7796
mmcblk1           0.74        25.53        15.89      45956      28596







Samsung EVO840 USB2 (Class=Mass Storage, Driver=uas, 480M)

real	146m49.211s
user	415m29.511s
sys	66m59.827s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          82.26    0.00   11.60    5.52    0.00    0.63
          77.11    0.00   16.94    5.68    0.00    0.27
          67.59    0.00   19.14   12.92    0.00    0.35
          78.39    0.00   18.05    3.50    0.00    0.07
          44.69    0.00   13.18   15.43    0.00   26.70

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             875.49      3208.80      6997.15    5775896   12595008
sda            1791.04      4718.25      5574.88    8493692   10035784
sda            2491.66      6055.81      6341.38   10900164   11414168
sda            1844.43      5053.24      4258.27    9095928    7664972
sda            1854.93      4253.01      5503.57    7655424    9906428

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           5.74       360.25        14.36     648456      25856
mmcblk1           3.96       176.13         4.72     317068       8504
mmcblk1           5.94       173.85         4.32     312916       7776
mmcblk1           6.75       182.92         7.72     329256      13888
mmcblk1           6.43       259.09        39.36     466356      70856







SAMSUNG HM500JI USB2 (Class=Mass Storage, Driver=uas, 480M)

real	481m19.903s
user	346m28.221s
sys	23m5.888s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          62.34    0.00    5.30   26.52    0.00    5.84
          47.05    0.00    3.66   45.03    0.00    4.27
          30.16    0.00    2.33   60.95    0.00    6.55
          37.39    0.00    3.19   52.27    0.00    7.15
          47.45    0.00    4.44   42.35    0.00    5.76
          16.43    0.00    1.47   52.25    0.00   29.85
           1.86    0.00    0.72   46.10    0.00   51.32
           2.40    0.00    0.70   54.83    0.00   42.08
           1.83    0.00    0.60   52.51    0.00   45.06
           2.38    0.00    0.61   54.73    0.00   42.28
           3.41    0.00    0.72   60.17    0.00   35.70
           2.12    0.00    0.60   53.32    0.00   43.97
           2.89    0.00    0.58   58.69    0.00   37.84
           3.39    0.00    0.68   60.29    0.00   35.63
           3.35    0.00    0.65   52.47    0.00   43.53
          24.89    0.00    0.51   23.60    0.00   51.00
           0.46    0.00    0.05    0.43    0.00   99.05

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             202.08      1103.73      2568.98    1987224    4625352
sda             210.18       823.92       759.15    1483160    1366572
sda             251.15       614.00      1118.09    1105060    2012324
sda             243.75       637.75       562.81    1147928    1013036
sda             242.72       535.27       757.41     963668    1363592
sda             179.62       584.52       240.07    1052216     432164
sda             183.93       595.70       468.54    1072096     843248
sda             196.65       569.87       352.33    1025772     634192
sda             185.96       533.97       285.93     961144     514676
sda             190.90       503.49       303.76     906288     546772
sda             189.79       479.81       467.16     863660     840892
sda             174.72       498.87       315.20     897960     567356
sda             178.52       462.67       278.10     832800     500588
sda             192.74       449.58       405.82     809244     730484
sda             183.37       497.22       350.15     894996     630268
sda              74.32       312.27        14.20     562084      25560
sda               2.92        12.27         0.00      22092          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           5.04       331.77        14.07     597332      25332
mmcblk1           0.85        27.32         5.08      49172       9144
mmcblk1           0.47        45.07         1.68      81124       3032
mmcblk1           1.67        41.63         5.73      74924      10312
mmcblk1           3.30        79.95         5.89     143936      10612
mmcblk1           0.99        24.50         2.61      44108       4692
mmcblk1           0.06         8.91         0.48      16036        860
mmcblk1           0.11        10.18         1.02      18328       1828
mmcblk1           0.08        10.23         1.04      18416       1876
mmcblk1           0.03         4.24         0.98       7640       1764
mmcblk1           0.02         3.66         0.45       6580        812
mmcblk1           0.05         7.84         1.00      14120       1804
mmcblk1           0.02         2.26         0.45       4068        812
mmcblk1           0.03         4.56         0.98       8216       1764
mmcblk1           0.09        15.84         1.41      28520       2536
mmcblk1           0.69        23.72        20.68      42696      37220
mmcblk1           0.26         6.97        22.35      12544      40232







zram lz4 4 streams:

real	127m46.928s
user	402m52.389s
sys	60m1.737s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          84.16    0.00   15.81    0.00    0.00    0.02
          81.10    0.00   18.89    0.00    0.00    0.01
          76.31    0.00   23.68    0.00    0.00    0.01
          79.77    0.00   20.22    0.00    0.00    0.01
          16.56    0.00    8.59    0.01    0.00   74.85

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          1203.30      1641.51      3171.71    2954732    5709104
zram2          1202.68      1638.97      3171.76    2950160    5709196
zram3          1203.74      1643.14      3171.81    2957672    5709288
zram4          1202.88      1639.80      3171.70    2951664    5709096
zram1          1634.75      2491.59      4047.41    4485068    7285668
zram2          1632.13      2481.02      4047.48    4466040    7285792
zram3          1632.28      2481.82      4047.32    4467468    7285496
zram4          1633.93      2488.36      4047.37    4479248    7285592
zram1          2142.44      3778.25      4791.51    6800844    8624712
zram2          2141.56      3774.83      4791.39    6794700    8624500
zram3          2141.69      3775.36      4791.42    6795640    8624556
zram4          2142.77      3779.59      4791.50    6803260    8624692
zram1          1714.65      2936.11      3922.50    5285296    7060884
zram2          1713.89      2933.00      3922.56    5279696    7061000
zram3          1712.34      2926.74      3922.62    5268420    7061116
zram4          1714.12      2933.77      3922.69    5281088    7061232
zram1           755.08      1467.73      1552.57    2641884    2794600
zram2           756.14      1472.05      1552.53    2649652    2794528
zram3           754.55      1465.77      1552.42    2638356    2794328
zram4           755.22      1468.43      1552.46    2643152    2794400

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          16.72      1025.73        15.19    1846324      27340
mmcblk1          17.52       957.67         6.43    1723876      11572
mmcblk1          21.79      1547.45        26.22    2785408      47188
mmcblk1          18.02       970.77         7.58    1747492      13652
mmcblk1          12.51       803.49        26.94    1446264      48500







zram lzo 4 streams:

real	125m26.180s
user	405m39.383s
sys	36m14.588s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          85.31    0.00   14.63    0.00    0.00    0.05
          82.56    0.00   17.44    0.00    0.00    0.00
          80.95    0.00   19.05    0.00    0.00    0.00
          79.52    0.00   20.47    0.00    0.00    0.01
          11.89    0.00    5.46    0.00    0.00   82.65

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          1174.74      1582.76      3116.19    2848960    5609136
zram2          1174.93      1583.45      3116.27    2850208    5609284
zram3          1175.11      1584.21      3116.24    2851580    5609232
zram4          1174.92      1583.52      3116.15    2850328    5609072
zram1          1588.69      2414.99      3939.76    4346988    7091564
zram2          1589.01      2416.06      3939.99    4348900    7091976
zram3          1588.51      2414.16      3939.86    4345492    7091748
zram4          1588.29      2413.34      3939.81    4344016    7091660
zram1          1816.51      3200.85      4065.19    5761560    7317376
zram2          1815.69      3197.45      4065.31    5755436    7317592
zram3          1816.19      3199.44      4065.32    5759020    7317620
zram4          1816.44      3200.38      4065.37    5760724    7317700
zram1          1823.10      3125.61      4166.79    5626100    7500224
zram2          1821.87      3120.63      4166.85    5617132    7500332
zram3          1822.11      3121.60      4166.83    5618876    7500288
zram4          1822.79      3124.31      4166.83    5623760    7500292
zram1           517.68       987.91      1082.82    1778232    1949072
zram2           517.41       986.80      1082.82    1776248    1949084
zram3           517.20       985.93      1082.87    1774680    1949168
zram4           517.13       985.70      1082.82    1774256    1949072

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          12.29       782.37        13.75    1408268      24752
mmcblk1          10.81       635.53         6.22    1143952      11188
mmcblk1          14.39       921.31        26.93    1658368      48476
mmcblk1          14.46       857.51         7.68    1543516      13828
mmcblk1           5.61       351.26        20.07     632264      36120







zram lzo 1 stream:

real	124m52.403s
user	397m20.110s
sys	58m55.228s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          84.59    0.00   15.28    0.00    0.00    0.13
          81.48    0.00   18.35    0.00    0.00    0.18
          79.98    0.00   19.84    0.00    0.00    0.17
          76.64    0.00   22.98    0.00    0.00    0.38
          10.40    0.00    5.06    0.00    0.00   84.54

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          4558.09      6483.94     11748.41   11671812   21148436
zram1          6377.26     10079.11     15429.92   18142504   27774004
zram1          7050.71     12673.06     15529.80   22811500   27953632
zram1          7258.40     12565.52     16468.06   22617944   29642512
zram1          1644.30      3404.62      3172.57    6128356    5710652

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          15.91       968.35        16.16    1743132      29092
mmcblk1          15.08       852.01         5.87    1533628      10568
mmcblk1          17.93      1064.32        26.88    1915772      48376
mmcblk1          18.86      1036.18         9.54    1865128      17168
mmcblk1           8.58       494.16        22.66     889484      40780






SD card:

real	1151m21.149s
user	342m49.658s
sys	216m25.202s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.12    0.00    1.60   80.44    0.00    6.83
           9.03    0.00    1.71   78.10    0.00   11.15
          10.69    0.00    1.42   82.05    0.00    5.84
          21.97    0.00    1.34   70.34    0.00    6.35
           9.90    0.00    1.97   69.94    0.00   18.18
           0.35    0.00    0.83   88.24    0.00   10.57
           2.14    0.00    1.50   81.17    0.00   15.20
           2.68    0.00    1.25   81.60    0.00   14.47
           2.87    0.00    1.10   84.92    0.00   11.11
           8.89    0.00    2.19   77.69    0.00   11.23
           3.54    0.00    1.47   84.79    0.00   10.21
          10.19    0.00    2.91   76.93    0.00    9.96
          10.22    0.00    3.20   78.63    0.00    7.96
           8.12    0.00    4.10   73.85    0.00   13.93
           2.93    0.00    9.44   68.68    0.00   18.95
           2.02    0.00    5.03   62.21    0.00   30.73
           2.30    0.00    6.99   69.24    0.00   21.47
           7.43    0.00    6.35   68.76    0.00   17.45
           6.63    0.00   15.24   61.14    0.00   16.99
           6.37    0.00   12.22   66.69    0.00   14.73
           7.44    0.00   13.48   64.41    0.00   14.67
           1.59    0.00    2.16   81.48    0.00   14.76
           8.89    0.00   13.72   64.28    0.00   13.11
           4.74    0.00    5.52   77.09    0.00   12.65
           5.57    0.00    9.39   72.62    0.00   12.42
           9.20    0.00   13.25   63.39    0.00   14.16
           7.95    0.00   12.44   65.51    0.00   14.11
          11.55    0.00   14.41   60.93    0.00   13.11
           9.60    0.00   15.77   62.18    0.00   12.44
           3.92    0.00   15.74   59.88    0.00   20.46
           5.34    0.00   19.68   55.40    0.00   19.58
           7.11    0.00   19.83   55.26    0.00   17.81
           6.17    0.00   16.61   57.38    0.00   19.84
           6.29    0.00   16.96   56.82    0.00   19.93
           6.26    0.00   17.32   55.47    0.00   20.95
           6.20    0.00    8.00   56.80    0.00   29.00
           9.91    0.00    7.30   50.51    0.00   32.28
          18.88    0.00    6.58   46.38    0.00   28.16
           7.03    0.00    0.15    0.43    0.00   92.39

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk2          76.61       450.41      1382.78     810772    2489124
mmcblk2         118.21       657.84      1485.70    1204804    2721000
mmcblk2          71.49       515.17      1415.50     955424    2625136
mmcblk2          35.76       296.81      1296.73     568028    2481664
mmcblk2         201.44       977.12       848.80    1840180    1598524
mmcblk2          29.22        16.47       110.05      28892     193032
mmcblk2          97.93       293.55       159.87     535056     291392
mmcblk2          78.06       206.14       127.74     369048     228684
mmcblk2          65.40       167.94       148.04     324444     286004
mmcblk2         135.49       394.42       180.17     685148     312972
mmcblk2          75.55       201.50       153.93     355436     271520
mmcblk2         143.00       401.61       227.92     749180     425180
mmcblk2         156.31       396.85       262.57     713496     472064
mmcblk2         139.27       240.70       347.53     435308     628516
mmcblk2         161.35       251.15       506.99     449260     906896
mmcblk2         239.78       528.38       476.98     949888     857484
mmcblk2         172.31       352.63       400.22     642808     729560
mmcblk2         178.92       468.90       311.74     844492     561436
mmcblk2         340.91       799.06       710.93    1439004    1280280
mmcblk2         277.69       683.66       547.90    1232240     987552
mmcblk2         321.37       796.36       631.85    1443872    1145604
mmcblk2          77.21       181.19       156.36     327988     283040
mmcblk2         317.58       788.05       607.13    1407312    1084220
mmcblk2         157.16       407.68       282.06     744680     515216
mmcblk2         198.91       439.60       443.84     787068     794652
mmcblk2         337.17       818.63       667.23    1474116    1201480
mmcblk2         332.78       796.49       659.56    1431248    1185196
mmcblk2         373.29       972.52       664.27    1750564    1195704
mmcblk2         335.74       778.88       707.79    1405296    1277036
mmcblk2         326.77       801.76       669.36    1445520    1206824
mmcblk2         357.16      1028.33       668.17    1845528    1199148
mmcblk2         328.49       891.48       669.25    1604684    1204672
mmcblk2         325.89       861.09       661.31    1553152    1192808
mmcblk2         358.29       981.11       661.84    1776640    1198484
mmcblk2         420.42      1245.85       659.41    2232780    1181772
mmcblk2         470.04      1186.00       723.31    2133452    1301144
mmcblk2         518.43      1405.45       696.22    2534760    1255640
mmcblk2         477.34      1259.47       666.39    2264684    1198256
mmcblk2          12.11        69.58        49.24     125248      88636

 

Share this post


Link to post
Share on other sites

Next test: RK3399 with unlocked performance (all 6 CPU cores active at usual clockspeeds: 1.5/2.0GHz)

          w/o     nvme     lzo/2    lzo/6    lz4/2    lz4/6
real	 31m55    40m32    41m56    41m38    43m57    44m26
user    184m16   194m58   200m37   202m20   195m17   197m51
sys       6m04    16m17    25m02    23m14    40m59    42m15

Full test output:

 

Without Swap:

real	31m55.360s
user	184m16.317s
sys	6m3.999s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          96.78    0.00    3.11    0.00    0.00    0.11
          96.56    0.00    3.30    0.00    0.00    0.14
          11.57    0.00    0.42    0.00    0.00   88.01

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1           0.56         0.09        36.40         80      32758
mmcblk1           1.53         6.29        95.42       5664      85876
mmcblk1           0.14         0.26        29.38        232      26440





lzo 2 streams:

real	41m56.261s
user	200m36.964s
sys	25m2.247s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          79.01    0.00   20.70    0.04    0.00    0.25
          83.46    0.00   16.40    0.00    0.00    0.14
          61.40    0.00   17.19    0.01    0.00   21.40

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1         10000.57     15515.19     24487.10   13963672   22038392
zram2          9994.63     15491.28     24487.23   13942156   22038508
zram1         10381.19     17556.42     23968.32   15800952   21571732
zram2         10378.70     17546.46     23968.32   15791988   21571732
zram1          9469.21     16759.32     21117.53   15083392   19005776
zram2          9455.55     16704.60     21117.59   15034136   19005828

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          83.01      8366.31        33.08    7529680      29776
mmcblk1          38.71      2303.10        58.27    2072816      52444
mmcblk1          46.68      2686.10        50.05    2417489      45048






lzo 6 streams:

real	41m38.302s
user	202m20.016s
sys	23m14.408s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          83.18    0.00   16.71    0.00    0.00    0.11
          82.59    0.00   17.29    0.01    0.00    0.11
          59.98    0.00   16.90    0.01    0.00   23.11

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          3039.69      4346.14      7812.62    3911524    7031360
zram2          3040.84      4350.74      7812.63    3915668    7031364
zram3          3038.73      4342.47      7812.44    3908220    7031196
zram4          3039.70      4346.43      7812.39    3911784    7031148
zram5          3038.88      4342.94      7812.59    3908648    7031328
zram6          3040.02      4347.62      7812.47    3912860    7031220
zram1          3416.83      5791.84      7875.48    5212656    7087936
zram2          3417.75      5795.52      7875.49    5215968    7087944
zram3          3419.16      5801.10      7875.53    5220992    7087980
zram4          3417.88      5795.81      7875.70    5216232    7088132
zram5          3418.87      5799.86      7875.61    5219872    7088048
zram6          3419.90      5804.16      7875.44    5223740    7087896
zram1          3060.02      5409.21      6830.87    4868344    6147848
zram2          3059.34      5406.76      6830.61    4866136    6147616
zram3          3059.39      5406.96      6830.61    4866316    6147616
zram4          3060.38      5410.83      6830.70    4869800    6147696
zram5          3060.49      5411.10      6830.87    4870040    6147848
zram6          3059.70      5407.84      6830.96    4867112    6147932

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          52.03      3429.48        32.36    3086532      29120
mmcblk1          37.18      2246.65        58.60    2021988      52740
mmcblk1          47.53      2735.41        52.83    2461900      47544






lz4 2 streams:

real	43m57.637s
user	195m17.556s
sys	40m59.904s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          77.08    0.00   22.16    0.03    0.00    0.73
          75.15    0.00   23.91    0.02    0.00    0.92
          65.67    0.00   25.35    0.03    0.00    8.94

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          8801.74     12553.56     22653.41   11298200   20388072
zram2          8800.08     12547.31     22653.00   11292580   20387704
zram1         11100.47     18802.53     25599.34   16922840   23040176
zram2         11099.44     18798.44     25599.32   16919156   23040156
zram1         10362.92     18539.87     22911.82   16685884   20620640
zram2         10355.76     18511.28     22911.76   16660152   20620584

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          66.22      4788.56        31.21    4309700      28092
mmcblk1          57.70      3918.69        57.91    3526936      52124
mmcblk1          66.55      3865.90        57.75    3479308      51976






lz4 6 streams:

real	44m26.940s
user	197m51.586s
sys	42m15.525s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          77.44    0.00   22.18    0.02    0.00    0.35
          75.06    0.00   24.54    0.01    0.00    0.38
          68.29    0.00   26.81    0.02    0.00    4.88

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
zram1          2866.07      4112.82      7351.47    3701580    6616400
zram2          2866.50      4114.47      7351.54    3703064    6616464
zram3          2866.32      4113.75      7351.52    3702416    6616444
zram4          2868.42      4122.26      7351.44    3710072    6616372
zram5          2865.52      4110.65      7351.45    3699624    6616380
zram6          2867.04      4116.73      7351.44    3705100    6616368
zram1          3610.06      6068.57      8371.69    5461832    7534688
zram2          3608.80      6063.51      8371.70    5457276    7534700
zram3          3608.88      6063.82      8371.70    5457560    7534700
zram4          3612.11      6076.78      8371.65    5469228    7534652
zram5          3608.69      6062.87      8371.88    5456704    7534856
zram6          3609.15      6064.71      8371.88    5458364    7534856
zram1          3628.46      6460.98      8052.86    5814944    7247656
zram2          3626.32      6452.34      8052.93    5807168    7247720
zram3          3626.74      6454.27      8052.68    5808912    7247492
zram4          3627.31      6456.32      8052.91    5810752    7247700
zram5          3627.74      6458.25      8052.71    5812488    7247520
zram6          3629.89      6466.64      8052.92    5820044    7247708

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          64.98      4413.62        30.87    3972300      27780
mmcblk1          56.31      3752.78        59.29    3377580      53364
mmcblk1          67.02      4039.19        58.18    3635312      52360





NVMe SSD:

real	40m32.985s
user	194m58.775s
sys	16m17.506s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          87.55    0.00   11.60    0.62    0.00    0.23
          85.85    0.00   11.29    2.42    0.00    0.45
          85.85    0.00   11.29    2.42    0.00    0.45

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
nvme0n1       10178.30     12957.70     27755.50   11662060   24980224
nvme0n1       12027.94     22808.95     25302.80   20528052   22772516
nvme0n1       12027.94     22808.95     25302.80   20528052   22772516

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
mmcblk1          44.03      2983.29        31.68    2684992      28508
mmcblk1          22.29       930.41        17.25     837372      15524
mmcblk1          22.29       930.41        17.25     837372      15524

 

 

 

For obvious reasons I did not test the crappy variants again (HDD, SD card, USB attached anything). So we're only looking at performance without swap, swap on NVMe SSD and zram.

 

RK3399 when allowed to run at full speed finishes the same compile job in less than 32 minutes. Swap on NVMe SSD increases time by almost 30% now. I now also compared whether the count of zram devices makes a difference (still on RK's 4.4 kernel). Still lzo outperforms lz4 (which is irritating since everyone tells you lz4 would be an improvement over lzo) but there is no clear answer about count of zram devices (in fact the kernel uses 1-n streams to access each device so with modern kernels even a single zram device should suffice since kernel takes care of distributing the load accross all the CPU cores)

Share this post


Link to post
Share on other sites
5 hours ago, mindee said:

very helpful for someone who want to know about the NVME SSD real performance on RK3399 boards

 

In fact I bought cheap since I got the NVMe SSD for less than 40 bucks on sale. My small TS128GMTE110S has soldered only 2 flash chips on it, if the maximum would be present (8) then it would be much much faster since all modern SSD controllers make heavy use of paralellisms (the more flash chips the faster).

 

I did a quick iozone test with kernel 4.4 and results look not that great compared to an EVO 960 for example.

 

Transcend TS128GMTE110S ext4                                  random    random
              kB  reclen    write  rewrite    read    reread    read     write
          102400       4    81473   115499   145683   147298    40272    82992
          102400      16   179264   249123   293189   293318   111869   246077
          102400     512   578704   579090   829601   832798   658995   567873
          102400    1024   585086   577757   928864   935910   789642   565868
          102400   16384   527840   531048  1045632  1056965  1031390   546275
         2048000   16384   544678   549905  1064665  1064439  1039331   545381

 

 

But that's not relevant since the protocol makes the difference. NVMe has been invented in this century and not the last as all the other storage protocols we use today. And this makes a real difference since NVMe has been developed with accessing flash storage efficiently in mind. All the other protocols we might use (including SATA) were designed decades ago for way slower storage and all do bottleneck access to fast flash.

 

With swap on the NVMe SSD the maximum %iowait percentage according to iostat monitoring was 0.37%. That's 70 times less compared to up to 25.07% with USB3!

Share this post


Link to post
Share on other sites
On 9/10/2018 at 12:32 AM, tkaiser said:

Next test: RK3399 with unlocked performance (all 6 CPU cores active at usual clockspeeds: 1.5/2.0GHz)


          w/o     nvme     lzo/2    lzo/6    lz4/2    lz4/6
real	 31m55    40m32    41m56    41m38    43m57    44m26
user    184m16   194m58   200m37   202m20   195m17   197m51
sys       6m04    16m17    25m02    23m14    40m59    42m15

 

 

All those tests I did before were done with Rockchip's 4.4 kernel.

 

Since stuff in the kernel improves over time now let's test with brand new 4.19.0-rc1. I just did a quick build (with default device tree that limits maximum cpufreq to 1.8 GHz on the big and 1.4 GHz on the little cores) and only tested performance without swapping and zram based swap with the available algorithms (more recent kernels provide more compression algorithms to choose from):

          w/o      lzo      lz4      zstd    lz4hc
real	 29m11    35m58    36m59    48m38    58m55
user    167m59   177m24   175m22   182m02   173m57
sys       5m32    21m10    22m59    69m35   123m46

 

Results:

  • More recent kernel --> better results. Even with lower clockspeeds (1.8/1.4 GHz vs. 2.0/1.5 GHz) the test with kernel 4.19 runs 8% faster. So at same clockspeed this would result even in ~10% better performance
  • Performance drop with zram/lzo compared to no swap with 4.4 was 31%. With 4.19 it's just 23%. So efficiency/performance of the zram implementation itself also improved a lot
  • Again lzo is slightly faster than lz4, both zstd and lz4hc are no good candidates for this use case (but zstd is a great candidate for Armbian's new ramlog approach since it provides higher compression -- more on this later)

In other words: with mainline kernel it makes even more sense to swap to a compressed block device in RAM since performance further increased. With this specific use case (large compile job) the performance drop when running out of memory and the kernel starting to swap to zram is below the 25% margin which is just awesome :)

 

Share this post


Link to post
Share on other sites

Little update: In the meantime I also tested with the really fast Samsung 16GB eMMC 5.1 on NanoPC-T4 (again crippled down to a quad-core A53 at 800 MHz). The board runs off the NVMe SSD, I mounted the eMMC as an ext4 partition, put there ARM's ComputeLibrary install and the swapfile on and fired up the test again. First post above is updated.

 

With 4 GB RAM and no swap 100:39 minutes, with swapping on the fast NVMe SSD 118:47 and just 133:34 on the eMMC:

          w/o    nvme     lzo     lz4    emmc    usb2    usb3     hdd    sd card
real	100m39  118m47  125m26  127m46  133m34  146m49  154m51  481m19   1151m21

That's impressive. But this Samsung eMMC 5.1 on NanoPC-T4 (and also on ODROID-N1) is most probably the fastest eMMC we get on SBC today (see benchmark numbers). And still zram is faster and we get 'more RAM' for free (since swap on flash media contributes to the medium wearing out of course)

 

 

Share this post


Link to post
Share on other sites

Interesting tests, especially the lzo vs lz4 outcomes. We found the same difference last year and went for lzo zram in the end because of the overall performance benefit and marginal difference in compression

 

Lzo was faster and less cpu intensive overall despite lz4 being better on paper (and from the opinions of almost every internet warrior). In our case the difference came from the compression overhead of lz4 being much higher than lzo and while the decompression of lz4 was faster it wasn't enough to claw back what it loses in compression time. Interestingly when we ran the same tests on Intel cpu's (Core m3 4.5W) instead of Arm64 then the situation was reversed with lz4 coming out on top

 

Do you know which variant of lzo is being used by Armbian? We used 1x-1-15 for 4+ core Arm devices and 1x-1 for single / dual core devices

Share this post


Link to post
Share on other sites
1 hour ago, botfap said:

Lzo was faster and less cpu intensive overall despite lz4 being better on paper (and from the opinions of almost every internet warrior)

 

Ok, so another time the same observation. Maybe we should switch back to lzo then already. I feared my test always using the same task is somewhat flawed. At least it's configurable as SWAP_ALGORITHM in /etc/default/armbian-zram-config. But starting with the best default is for sure a good thing prior to next major release when this stuff gets rolled out.

 

1 hour ago, botfap said:

Do you know which variant of lzo is being used by Armbian?

 

Nope. I simply used default kernel settings (and only tested with Rockchip 4.4, mainline 4.14 on NanoPi Fire3 and 4.19 on RK3399). How to configure the specific algorithm?

 

Edit: another interesting observation: https://bugs.chromium.org/p/chromium/issues/detail?id=584437#c15 I really wonder whether the compression algorithms on ARM use NEON optimizations or not (the performance boost can be huge)

Share this post


Link to post
Share on other sites
1 hour ago, tkaiser said:

 

Ok, so another time the same observation. Maybe we should switch back to lzo then already. I feared my test always using the same task is somewhat flawed. At least it's configurable as SWAP_ALGORITHM in /etc/default/armbian-zram-config. But starting with the best default is for sure a good thing prior to next major release when this stuff gets rolled out.

 

If I were building a modern x86 target then I would go lz4 without question but on Arm there seems to be a flip around and lzo comes out on top in our testing for both Arm7 and Arm8. I have no idea why this is, maybe Intel's vector instructions are more efficient, anyone have any idea?

 

1 hour ago, tkaiser said:

 

Nope. I simply used default kernel settings (and only tested with Rockchip 4.4, mainline 4.14 on NanoPi Fire3 and 4.19 on RK3399). How to configure the specific algorithm?

 

Just had a look at an Armbian build for a tinkerboard (rk4.4) I have on my desk. Default kernel lzo algo is 1x_1 which is the fastest but least efficient of the variants and probably the best default option for anything with 1GB+ RAM. For 256MB/512MB boards then lz4 would probably offer 10-15% more in the way of commit-able ram at the expense of significantly slower compression speed and higher cpu utilization

 

There was also support for lzo 1x_999 which has almost double the compression efficiency of lzo 1x_1 but takes twice as long to compress making it worse than standard lz4 for zram use. There was no specific support for lzo 1x_1_15 which is primarily just a multi core optimization of of the 1x_1 algo but I think in newer kernels and lzo > 2.07 1x_1_15 is automatically used instead of 1x_1 when 4 cores or more are initialized

 

Your results suggest that you did the test with lzo 1x_1_15 because 1x_1 would have only used a single core and been slower than lz4 in theory

Share this post


Link to post
Share on other sites

Did some quick research after noticing your edit and wanting to verify my vector instruction suspicions. Neither lzo or lz4 use neon vector instructions on Arm7 or 8 which is very far from ideal and explains why the performance of lzo and lz4 is much better on Intel than on Arm

 

 

Share this post


Link to post
Share on other sites
2 2