Jump to content

Recommended Posts

Posted

Hi guys, 
we are working on custom board based on A20, similar to Olinuxino MICRO. Our firmware is Buildroot based, with enabled Cryptodev and openssl with -DHAVE_CRYPTODEV, -DUSE_CRYPTODEV_DIGESTS.  I know that we should not expect a drastic performance gain, but I wonder if this thing works at all, since I do not see any load on system level while the user space process still utilizes 100% of CPU on one core.  Yes, openssl performance test reports some better numbers after "modprobe cryptodev",  but not so good as other people report. I suspect that something is missing.  I do not see interrupts with number assigned to crypto-engine in dts.

crypto: crypto-engine@1c15000 {

compatible = "allwinner,sun7i-a20-crypto",

"allwinner,sun4i-a10-crypto";

reg = <0x01c15000 0x1000>;

interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;

clocks = <&ccu CLK_AHB_SS>, <&ccu CLK_SS>;

clock-names = "ahb", "mod";

};


What should I see in /proc/interrupts? Interrupt number 86?

In order to check if I have messed things up with Buildroot, I tried to test it with Armbian (mainline kernel). And... to my surprise you have no cryptodev module at all.  Is there any option to enable it as additional package or at least  at compile time?

Regards,
A.

 

Posted

Answering my own question, thanks to Corentin Labbe from sunxi google group.
Threre is no interrups from sun4i-a10-crypto since there is no DMA engine involved in process. 

The only way to check if SS enginge works is tool from this patch: https://lkml.org/lkml/2018/1/11/711
Or you just can trust your /proc/crypto, /dev/crypto,  openssl and cryptodev benchmarking tools.

Here is my benchmark.
 

Without cryptodev:
# openssl speed -evp aes-128-cbc  -elapsed 
You have chosen to measure elapsed time instead of user CPU time. 
Doing aes-128-cbc for 3s on 16 size blocks: 3850431 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 64 size blocks: 1092615 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 256 size blocks: 284371 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 1024 size blocks: 71779 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 8192 size blocks: 8989 aes-128-cbc's in 3.00s 
OpenSSL 1.0.2o  27 Mar 2018 
built on: reproducible build, date unspecified 
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) 
idea(int) blowfish(ptr) 
compiler: arm-buildroot-linux-gnueabihf-gcc -I. -I.. -I../include 
-fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS 
-D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -D_LARGEFILE_SOURCE 
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64  -Os  -DHAVE_CRYPTODEV 
-DUSE_CRYPTODEV_DIGESTS -DHASH_MAX_LEN=64 -DOPENSSL_BN_ASM_MONT 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM 
-DGHASH_ASM -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m 
-DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM 
The 'numbers' are in 1000s of bytes per second processed. 
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
aes-128-cbc      20535.63k    23309.12k    24266.33k    24500.57k    24545.96k 

After modprobe cryptodev: 
# openssl speed -evp aes-128-cbc -engine cryptodev -elapsed 
engine "cryptodev" set. 
You have chosen to measure elapsed time instead of user CPU time. 
Doing aes-128-cbc for 3s on 16 size blocks: 300402 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 64 size blocks: 259638 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 256 size blocks: 168054 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 1024 size blocks: 69703 aes-128-cbc's in 3.00s 
Doing aes-128-cbc for 3s on 8192 size blocks: 10517 aes-128-cbc's in 3.00s 
OpenSSL 1.0.2o  27 Mar 2018 
built on: reproducible build, date unspecified 
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) 
idea(int) blowfish(ptr) 
compiler: arm-buildroot-linux-gnueabihf-gcc -I. -I.. -I../include 
-fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS 
-D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -D_LARGEFILE_SOURCE 
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64  -Os  -DHAVE_CRYPTODEV 
-DUSE_CRYPTODEV_DIGESTS -DHASH_MAX_LEN=64 -DOPENSSL_BN_ASM_MONT 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM 
-DGHASH_ASM -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m 
-DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM 
The 'numbers' are in 1000s of bytes per second processed. 
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
aes-128-cbc       1602.14k     5538.94k    14340.61k    23791.96k    28718.42k 

Some gane starts with 8K blocks. 

sun4i-ss seems to be loaded: 

# dmesg | grep sun4i-ss 
[    1.382460] sun4i-ss 1c15000.crypto-engine: Die ID 0 <- btw, what 
means Die?  :)

 # cat /proc/crypto | grep sun4i-ss 
driver       : ecb-des3-sun4i-ss 
driver       : cbc-des3-sun4i-ss 
driver       : ecb-des-sun4i-ss 
driver       : cbc-des-sun4i-ss 
driver       : ecb-aes-sun4i-ss 
driver       : cbc-aes-sun4i-ss 
driver       : sha1-sun4i-ss 
driver       : md5-sun4i-ss 

Cryptodev's speed test: 

# ./speed 
Testing NULL cipher: 
Encrypting in chunks of 512 bytes: done. 754.94 MB in 5.00 secs: 150.99 MB/sec 
Encrypting in chunks of 1024 bytes: done. 1.51 GB in 5.00 secs: 0.30 GB/sec 
Encrypting in chunks of 2048 bytes: done. 3.02 GB in 5.00 secs: 0.60 GB/sec 
Encrypting in chunks of 4096 bytes: done. 4.34 GB in 5.00 secs: 0.87 GB/sec 
Encrypting in chunks of 8192 bytes: done. 7.00 GB in 5.00 secs: 1.40 GB/sec 
Encrypting in chunks of 16384 bytes: done. 10.49 GB in 5.00 secs: 2.10 GB/sec 
Encrypting in chunks of 32768 bytes: done. 14.19 GB in 5.00 secs: 2.84 GB/sec 
Encrypting in chunks of 65536 bytes: done. 17.24 GB in 5.00 secs: 3.45 GB/sec 

Testing AES-128-CBC cipher: 
Encrypting in chunks of 512 bytes: done. 104.50 MB in 5.00 secs: 20.90 MB/sec 
Encrypting in chunks of 1024 bytes: done. 124.35 MB in 5.00 secs: 24.87 MB/sec 
Encrypting in chunks of 2048 bytes: done. 136.81 MB in 5.00 secs: 27.36 MB/sec 
Encrypting in chunks of 4096 bytes: done. 142.10 MB in 5.00 secs: 28.42 MB/sec 
Encrypting in chunks of 8192 bytes: done. 146.18 MB in 5.00 secs: 29.23 MB/sec 
Encrypting in chunks of 16384 bytes: done. 148.24 MB in 5.00 secs: 29.65 MB/sec 
Encrypting in chunks of 32768 bytes: done. 149.42 MB in 5.00 secs: 29.88 MB/sec 
Encrypting in chunks of 65536 bytes: done. 149.88 MB in 5.00 secs: 29.97 MB/sec 


 

Posted
49 minutes ago, Alexey Volkov said:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
aes-128-cbc      20535.63k    23309.12k    24266.33k    24500.57k    24545.96k 

After modprobe cryptodev: 


type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
aes-128-cbc       1602.14k     5538.94k    14340.61k    23791.96k    28718.42k 

 

 

As expected initialization overhead negatively affecting tiny data chunks. What would be interesting is how CPU utilization looked while executing the test. IMO the simpelst idea is to run latest version of sbc-bench. See also https://forum.armbian.com/topic/7763-benchmarking-cpus/?page=4&amp;tab=comments#comment-59576

 

BTW: Why are you using an older OpenSSL version?

 

Posted
25 minutes ago, tkaiser said:

 

As expected initialization overhead negatively affecting tiny data chunks. What would be interesting is how CPU utilization looked while executing the test. IMO the simpelst idea is to run latest version of sbc-bench. See also https://forum.armbian.com/topic/7763-benchmarking-cpus/?page=4&amp;tab=comments#comment-59576

 


I will give it a try, thanks.  In both cases CPU was at 100% on one core. It is PIO, unfortunately. What is a little bit strange -  no load on system level, only on user space, I was expecting some load moves to kernel in case of Cryptodev.

 

30 minutes ago, tkaiser said:

 

BTW: Why are you using an older OpenSSL version?


Simply because there is no 1.1.0 version in Buildroot yet. I was able to produce 1.1.0 build by fixing makefile, but I'm not sure about missing patches relevant exactly to hardware acceleration.
 

Posted

At least 'sbc-bench m' (background monitoring) should always work. And yeah, on other platforms like Armada 38x cryptodev makes a huge difference wrt %usr vs. %sys. This is without cryptodev on Armada 385:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
12:25:41: 1332MHz  1.00  50%   0%  49%   0%   0%   0%  74.2°C
12:25:51: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  75.1°C
12:26:01: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  76.5°C
12:26:11: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  74.2°C
12:26:21: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  74.6°C
12:26:31: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  78.0°C
12:26:41: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  77.0°C
12:26:51: 1332MHz  1.00  50%   0%  49%   0%   0%   0%  74.6°C
12:27:01: 1332MHz  1.00  50%   0%  49%   0%   0%   0%  78.5°C
12:27:11: 1332MHz  1.00  50%   0%  50%   0%   0%   0%  77.5°C

 

Vs. with CESA/Cryptodev:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
08:17:40: 1600MHz  1.07  33%  33%   0%   0%   0%   0%  54.0°C
08:17:50: 1600MHz  1.14  17%  16%   0%   0%   0%   0%  54.0°C
08:18:00: 1600MHz  1.12  31%  30%   0%   0%   0%   0%  54.0°C
08:18:10: 1600MHz  1.10  22%  21%   0%   0%   0%   0%  54.0°C
08:18:20: 1600MHz  1.16  24%  23%   0%   0%   0%   0%  54.0°C
08:18:30: 1600MHz  1.13  28%  27%   0%   0%   0%   0%  54.0°C
08:18:40: 1600MHz  1.11  20%  19%   0%   0%   0%   0%  54.0°C
08:18:50: 1600MHz  1.18  31%  30%   0%   0%   0%   0%  54.0°C
08:19:00: 1600MHz  1.15  15%  15%   0%   0%   0%   0%  54.0°C
08:19:10: 1600MHz  1.12  31%  30%   0%   0%   0%   0%  53.0°C

 

 

Edit: Oh man, 3rd time in a few days that the forum software ate text when inserting code blocks. I really hate the forum engine.

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines