mdel Posted September 21, 2016 Posted September 21, 2016 Could someone remind me of the status of the H3 crypto engine, both hardware (capabilities, aes-ni ?) and software (mainline or vanilla kernels) ? I've been testing openvpn on an amlogic s905 box (still need to fix my beelink x2 problems) and as expected i'm hitting a cpu bottleneck. i've been able to achieve around 60Mbps (Blowfish-cbc, usb hdd, ext4) but openvpn being single threaded the cpu load maxes out thus limiting openvpn's bandwidth. Unfortunately as with most vpn providers you can't tweak all parameters and so, changing the tun-mtu size is not possible and has to stay at the default 1500 value. I will perform some tests on my beelink x2 or opi pc, but i'd like to know what would the benefits be (if any) of using an H3 in an openvpn context. thank you
vincele Posted September 21, 2016 Posted September 21, 2016 You can see status here But it not mainlined yet, Corentin told me he intends to upstream part of it in the near future. He probably would like to know if you test it...
zador.blood.stained Posted September 21, 2016 Posted September 21, 2016 Could someone remind me of the status of the H3 crypto engine, both hardware (capabilities, aes-ni ?) and software (mainline or vanilla kernels) ? Hardware - WIP, not ready yet. Software - should work if all necessary kernel options are enabled. This is NOT AES-NI, you can find more info here (including benchmarks for A10/A20 SS): http://sunxi.montjoie.ovh/ i've been able to achieve around 60Mbps (Blowfish-cbc, usb hdd, ext4) but openvpn being single threaded the cpu load maxes out thus limiting openvpn's bandwidth. And you won't get much higher speed with hardware encryption (or speed can be even lower on H3 if I remember correctly). The main advantage of sunxi SS is unused CPU. I will perform some tests on my beelink x2 or opi pc, but i'd like to know what would the benefits be (if any) of using an H3 in an openvpn context. This depends on your expectations. Here I linked an article where Orange Pi One was mounted inside a router to offload OpenVPN encryption/processing, and it was worth it compared to performance on router's CPU before.
mdel Posted September 26, 2016 Author Posted September 26, 2016 thx for your answers i'm not sure i understand why using the crypto engine won't speed things up, does it run at a fixed clock and does not scale to the cpu speed ? According to those benchmarks you should get +10-40% performance gain with SS (A10/A20 tests ?), not sure what happens with negative gains for SS DMA though.. i did read montjoie's page but i don't quite understand the current state of things. I did not see where the current sun8i-ce driver can be found. H3 sun8i-ce shows AES/DES/3DES-CBC as "ok", can i test that with current Armbian kernel ? Reading this log https://irclog.whitequark.org/linux-sunxi/2016-07-09, something should work with BSP kernel. Anyways i was able to improve performance a bit by tweaking tun-mtu parameters, i now get around 70Mbps. I first discarded those tweaks as openvpn's connection log showed warnings about server expecting other values, but it seems to work with the client parameters.
zador.blood.stained Posted September 26, 2016 Posted September 26, 2016 i'm not sure i understand why using the crypto engine won't speed things up, does it run at a fixed clock and does not scale to the cpu speed ? Yes. While AES-NI is an instructions set for CPU, Crypto Engine is a separate IP block with its own (fixed) clock. According to those benchmarks you should get +10-40% performance gain with SS (A10/A20 tests ?), not sure what happens with negative gains for SS DMA though.. Depending on algorithm and block size. i did read montjoie's page but i don't quite understand the current state of things. I did not see where the current sun8i-ce driver can be found. H3 sun8i-ce shows AES/DES/3DES-CBC as "ok", can i test that with current Armbian kernel ? Reading this log https://irclog.whitequark.org/linux-sunxi/2016-07-09, something should work with BSP kernel. No idea. Mainline driver is probably not published yet, and if BSP kernel has SS driver for H3, it should work.
mdel Posted September 26, 2016 Author Posted September 26, 2016 Did a simple benchmark on H3 and S905, both use armbian 5.20 (s905 is not an official armbian image and is not really an idle system) H3 : openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 2496591 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 1359414 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 358603 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 90918 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 11410 aes-128-cbc's in 3.00s OpenSSL 1.0.1t 3 May 2016 built on: Fri Sep 23 19:23:52 2016 options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 13315.15k 29000.83k 30600.79k 31033.34k 31156.91k S905 : openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 9116734 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 2825024 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 758392 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 193156 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 24278 aes-128-cbc's in 3.00s OpenSSL 1.0.1t 3 May 2016 built on: Fri Sep 23 18:26:35 2016 options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 48622.58k 60267.18k 64716.12k 65930.58k 66295.13k Now, i will test that H3 in real world conditions but i must say that if it actually reflects those benchmark, i'm going to be disappointed.
tkaiser Posted September 26, 2016 Posted September 26, 2016 Just a quick note, asked a customer today and got his answer now. Many internals so just the essence: they used individual OpenVPN instances on H3 devices for different site-to-site VPNs and used their own OpenSSL builds since they show better performance than distro packages. Now they want to switch to IPSec/strongSwan instead (being multithreaded and more performant) In the meantime they evaluate Pine64 for the job and sent me a link to these numbers: https://github.com/libressl-portable/openbsd/issues/68 Small note: Pine64/Pine64+ will clock at 1152 MHz with default Armbian settings. It's possible to use 1296 MHz or even 1344 MHz which should work flawlessly as long as a single OpenVPN instance will run on the device (so A64 will be busy on 2 cores maximum). BTW: OrangePi Plus 2E running Xenial on kernel 4.8-rc6 with OpenSSL 1.0.2g (distro package) and 1.1.1 (default build settings) at 1296 MHz (no cpufreq support in this kernel branch currently): tk@orangepiplus2e:~$ /usr/bin/openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 4028081 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 1161190 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 302715 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 76490 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 9590 aes-128-cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 21483.10k 24772.05k 25831.68k 26108.59k 26187.09k tk@orangepiplus2e:~$ /usr/local/src/openssl/apps/openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 3811820 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 1130671 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 296340 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 75500 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 9516 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 16384 size blocks: 4735 aes-128-cbc's in 3.00s OpenSSL 1.1.1-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -march=armv7-a -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 20329.71k 24120.98k 25287.68k 25770.67k 25985.02k 25859.41k
tkaiser Posted September 26, 2016 Posted September 26, 2016 BTW: Had a short laugh: http://vpneveryone.ddns.net/vpn.blackbox/opi-ipsec-vpn-server.htm 2
mdel Posted September 27, 2016 Author Posted September 27, 2016 interesting link but obviously 119.99$ +20$ shipping is a fair price. and i didn't know hackers could pick passwords with their dark fingers, scary ! okay i'm getting completely confused, what kind of features does that R18 soc has compared to an H3, besides being a Cortex-A53 vs Cortex-A7 ? that looks pretty impressive : The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 104979.10k 152962.08k 178409.00k 185506.59k 188549.29k then i've tried to understand what he's talking about but it's a bit beside me i must admit. what does he mean by : Are there any plans to backport the ARM optimizations for AES / SHA from OpenSSL what kind of optimizations is he talking about, is it the same thing as the hardware encryption engine worked on by montjoie or something else ?
tkaiser Posted September 27, 2016 Posted September 27, 2016 what kind of optimizations is he talking about, is it the same thing as the hardware encryption engine worked on by montjoie or something else ? No idea but at least it's worth a look. I googled a bit yesterday and there was at least one problem years ago when Debian didn't used assembler optimizations for ARM in their OpenSSL packages. But too much other work to get into details (funnily also VPN stuff but pfSense on x86 and me just clicking clueless around). In case you have a github account it would simply comment on the issue and ask for details R18 is not used on Pine64, they chose A64 instead (which is the same SoC just with a different chip id and a different business unit responsible for @ Allwinner -- they love to do market segmentation this way: R18/A64/H64 are more or less the same, A83T/R58/H8 also while H8 is limited to Android 4.4 by Allwinner and A83T is 'allowed' to get support for Android 5.1. Now Allwinner decided to provide Android 6.0 for A64 but fortunately a few community members started to put Android sources in a repo and now some talented guys over at pine64 forum started to port Android 7 to A64) A64 vs H3? They are pretty similar (many IP blocks identical) and the most obvious difference is ARMv7 vs. ARMv8 (which might make a huge difference in some areas since optimized instruction set can be used). Another difference that is not that relevant in our context: A64 has PMIC and battery support, the H series not.
mdel Posted September 28, 2016 Author Posted September 28, 2016 thx for the info, the PINE A64+ looks like a cheap alternative in the Gbe category. doing more tests on my s905 i noticed something concerning those openssl "benchmarks" : openssl speed -evp bf-cbc aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 54843.79k 61923.57k 65516.71k 66136.41k 66311.51k bf-cbc 35550.99k 44486.87k 47452.67k 48311.30k 48567.64k i was using blowfish on my system and seeing those figures i decided to switch to aes-128-cbc, expecting better speeds or lower cpu load, but it doesn't seem to be the case. I get more or less the same (realworld) speed (65-70 Mbps) with both encryptions but aes-128-cbc is apparently more taxing on the cpu. Also aes-128 seems to generate a much larger overhead than blowfish. So i was reading those tests as bandwidth capabilities but it doesn't seem to scale to realworld application. Still i will try to test H3 sun8i-ce driver when it becomes available.
vlad59 Posted September 28, 2016 Posted September 28, 2016 IIRC, a while ago, I made some benchmark with hardware crypto engine on a Dockstar and I was using : openssl speed -elapsed -evp aes-128-cbc the -elapsed was essential for the comparison
tkaiser Posted September 28, 2016 Posted September 28, 2016 I made some benchmark with hardware crypto engine on a Dockstar Seagate Dockstar is Marvell Kirkwood and there you're able to benefit from CESA through /dev/crypto (without performance would really suck). A small overview regarding those 'NAS and router SoCs' can be found here: https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators
vlad59 Posted September 28, 2016 Posted September 28, 2016 Seagate Dockstar is Marvell Kirkwood and there you're able to benefit from CESA through /dev/crypto (without performance would really suck). A small overview regarding those 'NAS and router SoCs' can be found here: https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators Exactly, SFTP performance were 2 times better with CESA but both my Dockstars NAND died after running 24/7 for 2 years.
mdel Posted September 29, 2016 Author Posted September 29, 2016 the -elapsed was essential for the comparison thx, my stats are more or less the same with -elapsed maybe it does not affect as much the non accelerated benchmarks
mdel Posted November 18, 2016 Author Posted November 18, 2016 montjoie has updated his crypto engine dev page and pointed some bad news for those Allwinner socs, including the H5 but looking at his project table it seems that the AES-CBC for H3 is labeled "OK", so i'd like to test the performance on my vpn link. I've tried to build his sun8i-ce-experimental branch but unfortunately i can't get it to boot on my opi pc (kernel panic), it's most probable that i didn't build it properly as i'm not quite experienced in building dev trees. Maybe someone can point to the proper steps to build that 4.9 source. Then if the results are decent, i don't expect much regarding my previous single thread openvpn tests, i'd like to know if it would be possible to include that driver in an armbian image, as i see there's no mainline image for the opi pc at the moment. thx
tkaiser Posted November 18, 2016 Posted November 18, 2016 Maybe someone can point to the proper steps to build that 4.9 source. When you use our build system then it's just adjusting the branch here. When you already use Armbian you can speed up things by only building kernel and u-boot (.deb files will appear below output/debs/ then): KERNEL_ONLY=yes
mdel Posted November 18, 2016 Author Posted November 18, 2016 Awesome, had no idea montjoie git was already in armbian scripts. So i'll try to build an armbian image directly and see how it works.
mdel Posted November 21, 2016 Author Posted November 21, 2016 okay the build works but i have a kernel config problem when simply changing the dev kernel git source, it doesn't seem to use the git kernel config, maybe because i did build a mainline (not dev) kernel first, i don't know. i've tried with full image build and KERNEL_ONLY=yes Anyways my dev build has : # CONFIG_CRYPTO_DEV_SUN8I_SS is not set so the driver is not there : ls /lib/modules/4.9.0-sun8i/kernel/drivers/crypto/sunxi-ss/ sun4i-ss.ko what's the armbian way to set kernel options when using a dev source ? i'll try fiddling with KERNEL_KEEP_CONFIG but i'm not sure i understand its real purpose. Also my bench with that 4.9 mainline kernel (xenial image) seems to be somewhat worse than with the legacy kernel (jessie), any idea why that would be : Armbian 5.24 kernel 4.9.0-sun8i xenial. >openssl speed -elapsed -evp bf-cbc aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 23108.93k 24980.25k 25877.42k 26112.00k 26184.36k bf-cbc 18900.00k 21908.86k 22817.79k 23052.63k 23123.29k i will try building a mainline jessie image to see if it gets back to the earlier benchmarks.
zador.blood.stained Posted November 21, 2016 Posted November 21, 2016 @mdel You should run the process with KERNEL_CONFIGURE=yes KERNEL_ONLY=yes and enable this option manually in kernel configuration menu. Or you can try full image build (without KERNEL_ONLY=yes)
mdel Posted November 21, 2016 Author Posted November 21, 2016 okay sorry i completely missed that one =)
mdel Posted December 5, 2016 Author Posted December 5, 2016 well i did manage to build that ss driver along with crytodev but not sure the hw engine was ever put to work with my openssl cryptodev build. Modules are loaded, and show some usage when openssl runs but the figures are almost exactly the same with or without the hardware engine, and the cpu load is always 100% on a the single core used. Anyways i've decided to stick with Amlogic s905 which is definitely far more powerful than H3 in the current software crypto environment. I've also tested my Opi PC2 H5 with Xulong Xenial server image Ubuntu_Server_Xenial_PC2_V0_9_0.img >sudo openssl speed -elapsed -evp bf-cbc aes-128-cbc OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 32461.71k 34913.11k 35852.12k 36085.08k 36162.22k bf-cbc 26314.67k 31261.35k 32766.81k 33205.93k 33333.25k I understand that the current Opi BSP image is junk, possibly the cpu being stuck at 1GHz, and actually the temperature from /sys/devices/virtual/thermal/cooling_device0/subsystem/thermal_zone0/temp would only go from 41 to 45 while running the test (no heatsink).. But those figures don't look very promising to me, and the steadiness of those figures for various packets size looks a bit funny but i'll trust the benchmark for now and won't bother testing that live with a vpn link. Considering montjoie's comments on the H5 crypto engine being the same as H3/A64 with some buggy / removed parts from the datasheets i would probably recommend staying away from Allwinner CPU if any crypto intensive use is expected. Maybe something else will come out of 2017, we'll see, in the meantime i'm going back to my ODROID-C2, best "cheap" board for intensive (usb) nas / vpn if you ask me.
lvmc Posted December 6, 2016 Posted December 6, 2016 @mdel, that is the openssl test you requested on this thread with S912 / Beelink GT1 running Armbian_5.24_Amlogic-s905x_Ubuntu_xenial_3.14.29_desktop_20161125 root@amlogic-s905x:~# openssl speed -elapsed -evp bf-cbc aes-128-cbc aes-256-cbcYou have chosen to measure elapsed time instead of user CPU time. Doing aes-128 cbc for 3s on 16 size blocks: 9073379 aes-128 cbc's in 3.03s Doing aes-128 cbc for 3s on 64 size blocks: 2438563 aes-128 cbc's in 3.02s Doing aes-128 cbc for 3s on 256 size blocks: 625809 aes-128 cbc's in 3.02s Doing aes-128 cbc for 3s on 1024 size blocks: 157495 aes-128 cbc's in 3.02s Doing aes-128 cbc for 3s on 8192 size blocks: 19716 aes-128 cbc's in 3.02s Doing aes-256 cbc for 3s on 16 size blocks: 7251936 aes-256 cbc's in 3.02s Doing aes-256 cbc for 3s on 64 size blocks: 1891159 aes-256 cbc's in 3.02s Doing aes-256 cbc for 3s on 256 size blocks: 482672 aes-256 cbc's in 3.02s Doing aes-256 cbc for 3s on 1024 size blocks: 121272 aes-256 cbc's in 3.02s Doing aes-256 cbc for 3s on 8192 size blocks: 15188 aes-256 cbc's in 3.02s Doing bf-cbc for 3s on 16 size blocks: 7347258 bf-cbc's in 3.02s Doing bf-cbc for 3s on 64 size blocks: 2182180 bf-cbc's in 3.02s Doing bf-cbc for 3s on 256 size blocks: 571752 bf-cbc's in 3.02s Doing bf-cbc for 3s on 1024 size blocks: 144898 bf-cbc's in 3.02s Doing bf-cbc for 3s on 8192 size blocks: 18182 bf-cbc's in 3.02s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 47912.23k 51678.16k 53048.71k 53402.28k 53481.28k aes-256 cbc 38420.85k 40077.54k 40915.24k 41120.04k 41198.71k bf-cbc 38925.87k 46244.87k 48466.39k 49130.98k 49320.18k Note: this command is using a single CPU.
mdel Posted December 6, 2016 Author Posted December 6, 2016 thx @lvmc it's more or less (a little less for some reasons) the same as s905. It's as expected, s912 having the same A53 cores clocks as the s905.
zador.blood.stained Posted December 6, 2016 Posted December 6, 2016 From this thread: https://forum.armbian.com/index.php/topic/2138-armbian-for-amlogic-s912/page-2 Anyways, i can't quite make sense of cryptsetup results compared to real life openvpn, but "openssl speed" is quite accurate. I unloaded marvell_cesa and mv_cesa modules and ran benchmarks again. openssl numbers didn't change (+/- 5%), but cryptsetup numbers changed into these: # Algorithm | Key | Encryption | Decryption aes-cbc 128b 35.0 MiB/s 38.1 MiB/s aes-cbc 256b 28.5 MiB/s 29.7 MiB/s So the simple answer is - cryptsetup can use HW crypto acceleration but openssl can't (at least current versions, AF_ALG should be supported in OpenSSL 1.1) And also here comes a conclusion - if you have a fast crypto engine (with DMA and overall good kernel support) it doesn't matter if your CPU cores are slower on paper
mdel Posted December 6, 2016 Author Posted December 6, 2016 yes you need to build an AF_ALG engine then configure openssl to use it, or force it on the benchmark command line with -engine af_alg "-elapsed" then becomes very import to get proper results, otherwise it will spit out crazy number looking at the "cpu time" (being not used..). i believe i used that one on my test : https://github.com/sarnold/af_alg But my understanding was that once the hw engine is used, cpu load should drop significantly, or are there still parts of the computing that will max out cpu load ? I don't understand my H3 results, i did get crazy numbers when using the engines and not adding "-elapsed", but the cpu load always went to 100% on one core, and then with "-elapsed" the results were within 1% of my non accelerated tests.. i don't understand what's the difference between cryptodev and af_alg or if they have specific applications. i will keep a close eye on that Armada 3700 kickstarter board, it seems you can only order the 1GB ram version which was the one i wanted, then it's 45e+27e shipping. For the moment i'll get myself a 35$ 2GB/8GB s905 box with Gigabit and see how it compares to the Odroid C2.
zador.blood.stained Posted December 6, 2016 Posted December 6, 2016 But my understanding was that once the hw engine is used, cpu load should drop significantly, or are there still parts of the computing that will max out cpu load ? I don't understand my H3 results, i did get crazy numbers when using the engines and not adding "-elapsed", but the cpu load always went to 100% on one core, and then with "-elapsed" the results were within 1% of my non accelerated tests.. At least with cryptsetup there is a noticeable difference regarding CPU load with CESA modules loaded and unloaded: 30% on single core (HW encryption, ~100MB/s) vs 100% on single core (SW encryption, ~35MB/s)
markbirss Posted January 18, 2018 Posted January 18, 2018 What is the current status of crypto support with Armbian? It still not stable enough to include ?
tkaiser Posted January 18, 2018 Posted January 18, 2018 27 minutes ago, markbirss said: What is the current status of crypto support with Armbian? There's nothing special. Simply stay away from all proprietary crypto modules and choose an ARMv8 SoC with support for ARM's crypto extensions. Reasons why: https://forum.armbian.com/topic/4583-rock64/?do=findComment&comment=37829
markbirss Posted January 19, 2018 Posted January 19, 2018 22 hours ago, tkaiser said: There's nothing special. Simply stay away from all proprietary crypto modules and choose an ARMv8 SoC with support for ARM's crypto extensions. Reasons why: https://forum.armbian.com/topic/4583-rock64/?do=findComment&comment=37829 Ok, thank you @tkaiser. I also wanted to suggest adding the "time" command to the benchmarks show more execution details time openssl speed -elapsed -evp aes-128-cbc time openssl speed -elapsed -evp aes-192-cbc time openssl speed -elapsed -evp aes-256-cb
Recommended Posts