Igor added in the meantime the device to the build system with our usual 'conservative' approaches (downclocking DRAM for example on those small boards with 16-bit DRAM config since we don't want to fry them).
So let's test with an Armbian Xenial arm64 now (64-bit kernel 4.13.13, max cpufreq limited to 1008 MHz and clocking DRAM at 408 MHz): https://pastebin.com/2iYbFRhD
Test BSP Armbian
std memcpy MB/s: 887.9 634.8
std memset MB/s: 2037.9 1553.0
7z comp 22: 1288 1234
7z comp 23: 1344 1279
7z decomp 22: 3296 3329
7z decomp 23: 3215 3317
sysbench 648 (s): 14.4798 14.1447
sysbench 816 (s): 11.4151 11.2191
sysbench 1008 (s): 9.2395 9.0787
openssl speed aes: identical
The way lower DRAM clockspeed ruins tinymembench numbers (and would most probably affect graphical applications that depend on memory bandwidth) but with tests that are affected by lower memory bandwidth and higher latency (like 7-zip) the difference is negligible (in fact with our kernel/settings 7-zip is finishing the first time not being oom killed). Debug output here: http://sprunge.us/KHCM
So once we rebased the whole stuff to 4.14 within the next weeks, then allow H5 to clock up to 1200 MHz some more performance improvements will follow.