2 2
mdel

crypto engine (openvpn related, aes-ni)

Recommended Posts

Could someone remind me of the status of the H3 crypto engine, both hardware (capabilities, aes-ni ?) and software (mainline or vanilla kernels) ?

 

I've been testing openvpn on an amlogic s905 box (still need to fix my beelink x2 problems) and as expected i'm hitting a cpu bottleneck.

 

i've been able to achieve around 60Mbps (Blowfish-cbc, usb hdd, ext4) but openvpn being single threaded the cpu load maxes out thus limiting openvpn's bandwidth.

 

Unfortunately as with most vpn providers you can't tweak all parameters and so, changing the tun-mtu size is not possible and has to stay at the default 1500 value.

 

I will perform some tests on my beelink x2 or opi pc, but i'd like to know what would the benefits be (if any) of using an H3 in an openvpn context. 

 

thank you

Share this post


Link to post
Share on other sites

Could someone remind me of the status of the H3 crypto engine, both hardware (capabilities, aes-ni ?) and software (mainline or vanilla kernels) ?

Hardware - WIP, not ready yet. Software - should work if all necessary kernel options are enabled. This is NOT AES-NI, you can find more info here (including benchmarks for A10/A20 SS): http://sunxi.montjoie.ovh/

 

i've been able to achieve around 60Mbps (Blowfish-cbc, usb hdd, ext4) but openvpn being single threaded the cpu load maxes out thus limiting openvpn's bandwidth.

And you won't get much higher speed with hardware encryption (or speed can be even lower on H3 if I remember correctly). The main advantage of sunxi SS is unused CPU.

 

I will perform some tests on my beelink x2 or opi pc, but i'd like to know what would the benefits be (if any) of using an H3 in an openvpn context. 

This depends on your expectations. Here I linked an article where Orange Pi One was mounted inside a router to offload OpenVPN encryption/processing, and it was worth it compared to performance on router's CPU before.

Share this post


Link to post
Share on other sites

thx for your answers

 

i'm not sure i understand why using the crypto engine won't speed things up, does it run at a fixed clock and does not scale to the cpu speed ?

According to those benchmarks you should get +10-40% performance gain with SS (A10/A20 tests ?), not sure what happens with negative gains for SS DMA though..

 

i did read montjoie's page but i don't quite understand the current state of things. I did not see where the current sun8i-ce driver can be found.

H3 sun8i-ce shows AES/DES/3DES-CBC as "ok", can i test that with current Armbian kernel ?

 

Reading this log https://irclog.whitequark.org/linux-sunxi/2016-07-09, something should work with BSP kernel.

 

Anyways i was able to improve performance a bit by tweaking tun-mtu parameters, i now get around 70Mbps.

I first discarded those tweaks as openvpn's connection log showed warnings about server expecting other values, but it seems to work with the client parameters.

Share this post


Link to post
Share on other sites

i'm not sure i understand why using the crypto engine won't speed things up, does it run at a fixed clock and does not scale to the cpu speed ?

Yes. While AES-NI is an instructions set for CPU, Crypto Engine is a separate IP block with its own (fixed) clock.

 

According to those benchmarks you should get +10-40% performance gain with SS (A10/A20 tests ?), not sure what happens with negative gains for SS DMA though..

Depending on algorithm and block size.

 

i did read montjoie's page but i don't quite understand the current state of things. I did not see where the current sun8i-ce driver can be found.

H3 sun8i-ce shows AES/DES/3DES-CBC as "ok", can i test that with current Armbian kernel ?

 

Reading this log https://irclog.whitequark.org/linux-sunxi/2016-07-09, something should work with BSP kernel.

No idea. Mainline driver is probably not published yet, and if BSP kernel has SS driver for H3, it should work.

Share this post


Link to post
Share on other sites

Did a simple benchmark on H3 and S905, both use armbian 5.20 (s905 is not an official armbian image and is not really an idle system)

 

H3 :

openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 2496591 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1359414 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 358603 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 90918 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 11410 aes-128-cbc's in 3.00s
OpenSSL 1.0.1t  3 May 2016
built on: Fri Sep 23 19:23:52 2016
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      13315.15k    29000.83k    30600.79k    31033.34k    31156.91k 

S905 :

openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 9116734 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2825024 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 758392 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 193156 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 24278 aes-128-cbc's in 3.00s
OpenSSL 1.0.1t  3 May 2016
built on: Fri Sep 23 18:26:35 2016
options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      48622.58k    60267.18k    64716.12k    65930.58k    66295.13k 

Now, i will test that H3 in real world conditions but i must say that if it actually reflects those benchmark, i'm going to be disappointed.

Share this post


Link to post
Share on other sites

Just a quick note, asked a customer today and got his answer now. Many internals so just the essence: they used individual OpenVPN instances on H3 devices for different site-to-site VPNs and used their own OpenSSL builds since they show better performance than distro packages. Now they want to switch to IPSec/strongSwan instead (being multithreaded and more performant)

 

In the meantime they evaluate Pine64 for the job and sent me a link to these numbers: https://github.com/libressl-portable/openbsd/issues/68

 

Small note: Pine64/Pine64+ will clock at 1152 MHz with default Armbian settings. It's possible to use 1296 MHz or even 1344 MHz which should work flawlessly as long as a single OpenVPN instance will run on the device (so A64 will be busy on 2 cores maximum).

 

BTW: OrangePi Plus 2E running Xenial on kernel 4.8-rc6 with OpenSSL 1.0.2g (distro package) and 1.1.1 (default build settings) at 1296 MHz (no cpufreq support in this kernel branch currently):

 

 

tk@orangepiplus2e:~$ /usr/bin/openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 4028081 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1161190 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 302715 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 76490 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 9590 aes-128-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      21483.10k    24772.05k    25831.68k    26108.59k    26187.09k



tk@orangepiplus2e:~$ /usr/local/src/openssl/apps/openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 3811820 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1130671 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 296340 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 75500 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 9516 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 4735 aes-128-cbc's in 3.00s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -march=armv7-a -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc      20329.71k    24120.98k    25287.68k    25770.67k    25985.02k    25859.41k 

 

 

Share this post


Link to post
Share on other sites

interesting link but obviously 119.99$ +20$ shipping is a fair price.

and i didn't know hackers could pick passwords with their dark fingers, scary !

 

okay i'm getting completely confused, what kind of features does that R18 soc has compared to an H3, besides being a Cortex-A53 vs Cortex-A7 ?

 

that looks pretty impressive :


The 'numbers' are in 1000s of bytes per second processed.
type           16 bytes    64 bytes   256 bytes  1024 bytes  8192 bytes
aes-128 cbc  104979.10k  152962.08k  178409.00k  185506.59k  188549.29k

then i've tried to understand what he's talking about but it's a bit beside me i must admit.

what does he mean by :

 

 

Are there any plans to backport the ARM optimizations for AES / SHA from OpenSSL

 

what kind of optimizations is he talking about, is it the same thing as the hardware encryption engine worked on by montjoie or something else ?

Share this post


Link to post
Share on other sites

what kind of optimizations is he talking about, is it the same thing as the hardware encryption engine worked on by montjoie or something else ?

 

No idea but at least it's worth a look. I googled a bit yesterday and there was at least one problem years ago when Debian didn't used assembler optimizations for ARM in their OpenSSL packages. But too much other work to get into details (funnily also VPN stuff but pfSense on x86 and me just clicking clueless around). In case you have a github account it would simply comment on the issue and ask for details ;)

 

R18 is not used on Pine64, they chose A64 instead (which is the same SoC just with a different chip id and a different business unit responsible for @ Allwinner -- they love to do market segmentation this way: R18/A64/H64 are more or less the same, A83T/R58/H8 also while H8 is limited to Android 4.4 by Allwinner and A83T is 'allowed' to get support for Android 5.1. Now Allwinner decided to provide Android 6.0 for A64 but fortunately a few community members started to put Android sources in a repo and now some talented guys over at pine64 forum started to port Android 7 to A64)

 

A64 vs H3? They are pretty similar (many IP blocks identical) and the most obvious difference is ARMv7 vs. ARMv8 (which might make a huge difference in some areas since optimized instruction set can be used). Another difference that is not that relevant in our context: A64 has PMIC and battery support, the H series not.

Share this post


Link to post
Share on other sites

thx for the info, the PINE A64+ looks like a cheap alternative in the Gbe category.

 

doing more tests on my s905 i noticed something concerning those openssl "benchmarks" :

openssl speed -evp bf-cbc aes-128-cbc 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      54843.79k    61923.57k    65516.71k    66136.41k    66311.51k
bf-cbc           35550.99k    44486.87k    47452.67k    48311.30k    48567.64k

i was using blowfish on my system and seeing those figures i decided to switch to aes-128-cbc, expecting better speeds or lower cpu load, but it doesn't seem to be the case.

I get more or less the same (realworld) speed (65-70 Mbps) with both encryptions but aes-128-cbc is apparently more taxing on the cpu.

Also aes-128 seems to generate a much larger overhead than blowfish.

 

So i was reading those tests as bandwidth capabilities but it doesn't seem to scale to realworld application.

 

Still i will try to test H3 sun8i-ce driver when it becomes available.

Share this post


Link to post
Share on other sites

IIRC, a while ago, I made some benchmark with hardware crypto engine on a Dockstar and I was using :

openssl speed -elapsed -evp aes-128-cbc

the -elapsed was essential for the comparison

Share this post


Link to post
Share on other sites

Seagate Dockstar is Marvell Kirkwood and there you're able to benefit from CESA through /dev/crypto (without performance would really suck). A small overview regarding those 'NAS and router SoCs' can be found here: https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators

 

Exactly, SFTP performance were 2 times better with CESA but both my Dockstars NAND died after running 24/7 for 2 years.

Share this post


Link to post
Share on other sites

the -elapsed was essential for the comparison

 

thx, my stats are more or less the same with -elapsed 

maybe it does not affect as much the non accelerated benchmarks

Share this post


Link to post
Share on other sites

montjoie has updated his crypto engine dev page and pointed some bad news for those Allwinner socs, including the H5 but looking at his project table it seems that the AES-CBC for H3 is labeled "OK", so i'd like to test the performance on my vpn link.

 

I've tried to build his sun8i-ce-experimental branch but unfortunately i can't get it to boot on my opi pc (kernel panic), it's most probable that i didn't build it properly as i'm not quite experienced in building dev trees.

 

Maybe someone can point to the proper steps to build that 4.9 source.

 

Then if the results are decent, i don't expect much regarding my previous single thread openvpn tests, i'd like to know if it would be possible to include that driver in an armbian image, as i see there's no mainline image for the opi pc at the moment.

 

thx

Share this post


Link to post
Share on other sites

Maybe someone can point to the proper steps to build that 4.9 source.

 

When you use our build system then it's just adjusting the branch here. When you already use Armbian you can speed up things by only building kernel and u-boot (.deb files will appear below output/debs/ then): KERNEL_ONLY=yes

Share this post


Link to post
Share on other sites

okay the build works but i have a kernel config problem

 

when simply changing the dev kernel git source, it doesn't seem to use the git kernel config, maybe because i did build a mainline (not dev) kernel first, i don't know.

 

i've tried with full image build and KERNEL_ONLY=yes

 

Anyways my dev build  has :

# CONFIG_CRYPTO_DEV_SUN8I_SS is not set

 

so the driver is not there :

ls /lib/modules/4.9.0-sun8i/kernel/drivers/crypto/sunxi-ss/ 

sun4i-ss.ko

 

what's the armbian way to set kernel options when using a dev source ?

 

i'll try fiddling with KERNEL_KEEP_CONFIG but i'm not sure i understand its real purpose.

 

Also my bench with that 4.9 mainline kernel (xenial image) seems to be somewhat worse than with the legacy kernel (jessie), any idea why that would be :

 

Armbian 5.24 kernel 4.9.0-sun8i xenial.

>openssl speed -elapsed -evp bf-cbc aes-128-cbc
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      23108.93k    24980.25k    25877.42k    26112.00k    26184.36k
bf-cbc           18900.00k    21908.86k    22817.79k    23052.63k    23123.29k

i will try building a mainline jessie image to see if it gets back to the earlier benchmarks.

Share this post


Link to post
Share on other sites

well i did manage to build that ss driver along with crytodev but not sure the hw engine was ever put to work with my openssl cryptodev build.

 

Modules are loaded, and show some usage when openssl runs but the figures are almost exactly the same with or without the hardware engine, and the cpu load is always 100% on a the single core used.

 

Anyways i've decided to stick with Amlogic s905 which is definitely far more powerful than H3 in the current software crypto environment.

 

I've also tested my Opi PC2 H5 with Xulong Xenial server image Ubuntu_Server_Xenial_PC2_V0_9_0.img

>sudo openssl speed -elapsed -evp bf-cbc aes-128-cbc  
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      32461.71k    34913.11k    35852.12k    36085.08k    36162.22k
bf-cbc           26314.67k    31261.35k    32766.81k    33205.93k    33333.25k

I understand that the current Opi BSP image is junk, possibly the cpu being stuck at 1GHz, and actually the temperature from /sys/devices/virtual/thermal/cooling_device0/subsystem/thermal_zone0/temp would only go from 41 to 45 while running the test (no heatsink)..

 

But those figures don't look very promising to me, and the steadiness of those figures for various packets size looks a bit funny but i'll trust the benchmark for now and won't bother testing that live with a vpn link. 

 

Considering montjoie's comments on the H5 crypto engine being the same as H3/A64 with some buggy / removed parts from the datasheets i would probably recommend staying away from Allwinner CPU if any crypto intensive use is expected.

 

Maybe something else will come out of 2017, we'll see, in the meantime i'm going back to my ODROID-C2, best "cheap" board for intensive (usb) nas / vpn if you ask me.

Share this post


Link to post
Share on other sites

@mdel, that is the openssl test you requested on this thread with S912 / Beelink GT1 running Armbian_5.24_Amlogic-s905x_Ubuntu_xenial_3.14.29_desktop_20161125

root@amlogic-s905x:~# openssl speed -elapsed -evp bf-cbc aes-128-cbc aes-256-cbcYou have chosen to measure elapsed time instead of user CPU time.
Doing aes-128 cbc for 3s on 16 size blocks:
9073379 aes-128 cbc's in 3.03s
Doing aes-128 cbc for 3s on 64 size blocks: 2438563 aes-128 cbc's in 3.02s
Doing aes-128 cbc for 3s on 256 size blocks: 625809 aes-128 cbc's in 3.02s
Doing aes-128 cbc for 3s on 1024 size blocks: 157495 aes-128 cbc's in 3.02s
Doing aes-128 cbc for 3s on 8192 size blocks: 19716 aes-128 cbc's in 3.02s
Doing aes-256 cbc for 3s on 16 size blocks: 7251936 aes-256 cbc's in 3.02s
Doing aes-256 cbc for 3s on 64 size blocks: 1891159 aes-256 cbc's in 3.02s
Doing aes-256 cbc for 3s on 256 size blocks: 482672 aes-256 cbc's in 3.02s
Doing aes-256 cbc for 3s on 1024 size blocks: 121272 aes-256 cbc's in 3.02s
Doing aes-256 cbc for 3s on 8192 size blocks: 15188 aes-256 cbc's in 3.02s
Doing bf-cbc for 3s on 16 size blocks: 7347258 bf-cbc's in 3.02s
Doing bf-cbc for 3s on 64 size blocks: 2182180 bf-cbc's in 3.02s
Doing bf-cbc for 3s on 256 size blocks: 571752 bf-cbc's in 3.02s
Doing bf-cbc for 3s on 1024 size blocks: 144898 bf-cbc's in 3.02s
Doing bf-cbc for 3s on 8192 size blocks: 18182 bf-cbc's in 3.02s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr)
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      47912.23k    51678.16k    53048.71k    53402.28k    53481.28k
aes-256 cbc      38420.85k    40077.54k    40915.24k    41120.04k    41198.71k
bf-cbc           38925.87k    46244.87k    48466.39k    49130.98k    49320.18k

Note: this command is using a single CPU.

Share this post


Link to post
Share on other sites

From this thread: https://forum.armbian.com/index.php/topic/2138-armbian-for-amlogic-s912/page-2

 

 

 

Anyways, i can't quite make sense of cryptsetup results compared to real life openvpn, but "openssl speed" is quite accurate.

I unloaded marvell_cesa and mv_cesa modules and ran benchmarks again.

openssl numbers didn't change (+/- 5%), but cryptsetup numbers changed into these:

#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b    35.0 MiB/s    38.1 MiB/s
     aes-cbc   256b    28.5 MiB/s    29.7 MiB/s

So the simple answer is - cryptsetup can use HW crypto acceleration but openssl can't (at least current versions, AF_ALG should be supported in OpenSSL 1.1)

And also here comes a conclusion - if you have a fast crypto engine (with DMA and overall good kernel support) it doesn't matter if your CPU cores are slower on paper

Share this post


Link to post
Share on other sites

yes you need to build an AF_ALG engine then configure openssl to use it, or force it on the benchmark command line with -engine af_alg

"-elapsed" then becomes very import to get proper results, otherwise it will spit out crazy number looking at the "cpu time" (being not used..).

 

i believe i used that one on my test : https://github.com/sarnold/af_alg

 

But my understanding was that once the hw engine is used, cpu load should drop significantly, or are there still parts of the computing that will max out cpu load ?

I don't understand my H3 results, i did get crazy numbers when using the engines and not adding "-elapsed", but the cpu load always went to 100% on one core, and then with "-elapsed" the results were within 1% of my non accelerated tests..

 

i don't understand what's the difference between cryptodev and af_alg or if they have specific applications.

 

i will keep a close eye on that Armada 3700 kickstarter board, it seems you can only order the 1GB ram version which was the one i wanted, then it's 45e+27e shipping.

 

For the moment i'll get myself a 35$ 2GB/8GB s905 box with Gigabit and see how it compares to the Odroid C2.

Share this post


Link to post
Share on other sites

But my understanding was that once the hw engine is used, cpu load should drop significantly, or are there still parts of the computing that will max out cpu load ?

I don't understand my H3 results, i did get crazy numbers when using the engines and not adding "-elapsed", but the cpu load always went to 100% on one core, and then with "-elapsed" the results were within 1% of my non accelerated tests..

At least with cryptsetup there is a noticeable difference regarding CPU load with CESA modules loaded and unloaded: 30% on single core (HW encryption, ~100MB/s) vs 100% on single core (SW encryption, ~35MB/s)

Share this post


Link to post
Share on other sites
22 hours ago, tkaiser said:

 

There's nothing special. Simply stay away from all proprietary crypto modules and choose an ARMv8 SoC with support for ARM's crypto extensions. Reasons why: https://forum.armbian.com/topic/4583-rock64/?do=findComment&comment=37829 

 

Ok, thank you @tkaiser

 

I also wanted to suggest adding the "time" command to the benchmarks show more execution details

 

time openssl speed -elapsed -evp aes-128-cbc

time openssl speed -elapsed -evp aes-192-cbc

time openssl speed -elapsed -evp aes-256-cb

 

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
2 2