This is some more research based on prior efforts.
The goal is to make more efficient use of available RAM. If the system runs low on memory only two options are possible: either the kernel invokes the oom-killer to quit tasks to free memory (oom --> out of memory) or starting to swap.
Swap is a problem if it happens on slow media. 'Slow' media usually describes the situation on SBC. 'Average' SD cards (not A1 rated) are slow as hell when it's about random IO performance. So swapping is usually something that should be avoided. But... technology improves over time.
In Linux we're able to swap not only to physical storage but since a few years also to compressed memory. If you want to get the details simply do a web search for zram or check Wikipedia first..
Test setup is a NanoPC-T4 equipped with 4 GB RAM (RK3399 based so a big.LITTLE design with 2xA72 and 4xA53). I crippled the board down to being a quad-core A53 running at 800 MHz where I can easily switch between 4GB RAM and lower numbers: Adding 'mem=1110M maxcpus=4' to kernel cmdline results in the A72 cores being inactive, the kernel only using 1 GB DRAM and for whatever reasons cpufreq scaling not working so the RK3399 statically being clocked at 808 MHz. All tests done with RK's 4.4 (4.4.152).
This test setup is meant as 'worst case possible'. A quad-core A53 at 800 MHz is more or less equivalent to a quad-core A7 running at ~1000-1100 MHz. So we're trying to test with the lower limit.
I used a compile job that requires up to 2.6 GB RAM to be built (based on this blog post). The task is to build ARM's Compute Library which involves swapping on systems with less than 3 GB memory. Let's have a look:
In the following I tried a couple of different scenarios: Swap on physical media and also two different zram algorithms:
w/o: no swapping happened since board booted with full 4GB RAM active
nvme: Transcend TS128GMTE110S SSD in M.2 slot, link is established at x4 Gen2
emmc: the 16GB ultra fast Samsung eMMC 5.1 on NanoPC-T4
usb2: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB2 port (UAS works)
usb3: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB3 port (UAS works)
hdd: Samsung HM500JI 2.5" HDD in JMS567 disk enclosure, attached to USB2 port (UAS works)
sd card: 'average' SanDisk 8 GB SD card (not A1 rated so horribly low random IO performance)
lzo: zram with lzo as compression algorithm
lz4: zram with lz4 as compression algorithm
And the numbers are:
w/o nvme lzo lz4 emmc usb2 usb3 hdd sd card
real 100m39 118m47 125m26 127m46 133m34 146m49 154m51 481m19 1151m21
user 389m48 415m38 405m39 402m52 415m38 415m29 407m18 346m28 342m49
sys 11m05 29m37 36m14 60m01 34m35 66m59 65m44 23m05 216m25
You need to look at the 1st row: that's the time the whole job took. For more details consult the 'time' manual page.
In other words: When limiting the RK3399 on NanoPC-T4 to just the four A53 cores running at 800 MHz the compile job takes 100 minutes with 4 GB RAM. As soon as we limit the available RAM to 1 GB swapping has to occur so it gets interesting how efficient the various approaches are:
NVMe SSD is the fastest option. Performance drop only 18%. That's due to NVMe being a modern storage protocol suited for modern (multi-core) CPUs. Problem: there's no PCIe and therefore no NVMe on the majority of SBC
Zram with both lzo and lz4 algorithms performs more or less the same (interestingly lzo slightly faster)
Slightly slower: the fast Samsung eMMC 5.1
Surprisingly the EVO840 SSD connected via USB2 performs better than connected via USB3 (some thoughts on this)
Using a HDD for swap is BS (and was BS already the last 4 decades but we had no alternative until SSDs appeared). The compile job needs almost 5 times longer to complete since all HDD suck at random IO
Using an average SD card for swap is just horrible. The job that finished within 100 minutes with 4 GB DRAM available took over 19 HOURS with swap on an average SD card (please note that today usual A1 rated SD cards are magnitudes faster and easily outperform HDDs)
Summarizing: NVMe SSDs are no general option (since only available on some RK3399 boards). Swap on HDD or SD card is insane. Swap on USB connected SSDs performs ok-ish (~1.5 times slower) so the best option is to use compressed DRAM. We get a performance drop of just 25% at no additional cost. That's amazing.
The above numbers were 'worst case'. That's why I crippled the RK3399 to a slow performing quad-core A53. You get the idea how 'worse' zram might be on the slowest SBCs Armbian runs on (I know that there are still the boring Allwinner A20 boards around -- yep, they're too slow for this).
When I did all this boring test stuff I always recorded the environment using 'iostat 1800' (reports every 30 minutes what really happened and shows in detail how much data has been transferred and on which the CPU cores spent time). Quite interesting to compare %user, %sys and especially %iowait percentages: