Alexey Volkov Posted August 6, 2018 Posted August 6, 2018 Hi guys, we are working on custom board based on A20, similar to Olinuxino MICRO. Our firmware is Buildroot based, with enabled Cryptodev and openssl with -DHAVE_CRYPTODEV, -DUSE_CRYPTODEV_DIGESTS. I know that we should not expect a drastic performance gain, but I wonder if this thing works at all, since I do not see any load on system level while the user space process still utilizes 100% of CPU on one core. Yes, openssl performance test reports some better numbers after "modprobe cryptodev", but not so good as other people report. I suspect that something is missing. I do not see interrupts with number assigned to crypto-engine in dts. crypto: crypto-engine@1c15000 { compatible = "allwinner,sun7i-a20-crypto", "allwinner,sun4i-a10-crypto"; reg = <0x01c15000 0x1000>; interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>; clocks = <&ccu CLK_AHB_SS>, <&ccu CLK_SS>; clock-names = "ahb", "mod"; }; What should I see in /proc/interrupts? Interrupt number 86? In order to check if I have messed things up with Buildroot, I tried to test it with Armbian (mainline kernel). And... to my surprise you have no cryptodev module at all. Is there any option to enable it as additional package or at least at compile time? Regards, A.
Alexey Volkov Posted August 8, 2018 Author Posted August 8, 2018 Answering my own question, thanks to Corentin Labbe from sunxi google group. Threre is no interrups from sun4i-a10-crypto since there is no DMA engine involved in process. The only way to check if SS enginge works is tool from this patch: https://lkml.org/lkml/2018/1/11/711 Or you just can trust your /proc/crypto, /dev/crypto, openssl and cryptodev benchmarking tools. Here is my benchmark. Without cryptodev: # openssl speed -evp aes-128-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 3850431 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 1092615 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 284371 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 71779 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 8989 aes-128-cbc's in 3.00s OpenSSL 1.0.2o 27 Mar 2018 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-buildroot-linux-gnueabihf-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Os -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DHASH_MAX_LEN=64 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 20535.63k 23309.12k 24266.33k 24500.57k 24545.96k After modprobe cryptodev: # openssl speed -evp aes-128-cbc -engine cryptodev -elapsed engine "cryptodev" set. You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 300402 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 259638 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 168054 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 69703 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 10517 aes-128-cbc's in 3.00s OpenSSL 1.0.2o 27 Mar 2018 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-buildroot-linux-gnueabihf-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Os -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DHASH_MAX_LEN=64 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1602.14k 5538.94k 14340.61k 23791.96k 28718.42k Some gane starts with 8K blocks. sun4i-ss seems to be loaded: # dmesg | grep sun4i-ss [ 1.382460] sun4i-ss 1c15000.crypto-engine: Die ID 0 <- btw, what means Die? # cat /proc/crypto | grep sun4i-ss driver : ecb-des3-sun4i-ss driver : cbc-des3-sun4i-ss driver : ecb-des-sun4i-ss driver : cbc-des-sun4i-ss driver : ecb-aes-sun4i-ss driver : cbc-aes-sun4i-ss driver : sha1-sun4i-ss driver : md5-sun4i-ss Cryptodev's speed test: # ./speed Testing NULL cipher: Encrypting in chunks of 512 bytes: done. 754.94 MB in 5.00 secs: 150.99 MB/sec Encrypting in chunks of 1024 bytes: done. 1.51 GB in 5.00 secs: 0.30 GB/sec Encrypting in chunks of 2048 bytes: done. 3.02 GB in 5.00 secs: 0.60 GB/sec Encrypting in chunks of 4096 bytes: done. 4.34 GB in 5.00 secs: 0.87 GB/sec Encrypting in chunks of 8192 bytes: done. 7.00 GB in 5.00 secs: 1.40 GB/sec Encrypting in chunks of 16384 bytes: done. 10.49 GB in 5.00 secs: 2.10 GB/sec Encrypting in chunks of 32768 bytes: done. 14.19 GB in 5.00 secs: 2.84 GB/sec Encrypting in chunks of 65536 bytes: done. 17.24 GB in 5.00 secs: 3.45 GB/sec Testing AES-128-CBC cipher: Encrypting in chunks of 512 bytes: done. 104.50 MB in 5.00 secs: 20.90 MB/sec Encrypting in chunks of 1024 bytes: done. 124.35 MB in 5.00 secs: 24.87 MB/sec Encrypting in chunks of 2048 bytes: done. 136.81 MB in 5.00 secs: 27.36 MB/sec Encrypting in chunks of 4096 bytes: done. 142.10 MB in 5.00 secs: 28.42 MB/sec Encrypting in chunks of 8192 bytes: done. 146.18 MB in 5.00 secs: 29.23 MB/sec Encrypting in chunks of 16384 bytes: done. 148.24 MB in 5.00 secs: 29.65 MB/sec Encrypting in chunks of 32768 bytes: done. 149.42 MB in 5.00 secs: 29.88 MB/sec Encrypting in chunks of 65536 bytes: done. 149.88 MB in 5.00 secs: 29.97 MB/sec
tkaiser Posted August 8, 2018 Posted August 8, 2018 49 minutes ago, Alexey Volkov said: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 20535.63k 23309.12k 24266.33k 24500.57k 24545.96k After modprobe cryptodev: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1602.14k 5538.94k 14340.61k 23791.96k 28718.42k As expected initialization overhead negatively affecting tiny data chunks. What would be interesting is how CPU utilization looked while executing the test. IMO the simpelst idea is to run latest version of sbc-bench. See also https://forum.armbian.com/topic/7763-benchmarking-cpus/?page=4&tab=comments#comment-59576 BTW: Why are you using an older OpenSSL version?
Alexey Volkov Posted August 8, 2018 Author Posted August 8, 2018 25 minutes ago, tkaiser said: As expected initialization overhead negatively affecting tiny data chunks. What would be interesting is how CPU utilization looked while executing the test. IMO the simpelst idea is to run latest version of sbc-bench. See also https://forum.armbian.com/topic/7763-benchmarking-cpus/?page=4&tab=comments#comment-59576 I will give it a try, thanks. In both cases CPU was at 100% on one core. It is PIO, unfortunately. What is a little bit strange - no load on system level, only on user space, I was expecting some load moves to kernel in case of Cryptodev. 30 minutes ago, tkaiser said: BTW: Why are you using an older OpenSSL version? Simply because there is no 1.1.0 version in Buildroot yet. I was able to produce 1.1.0 build by fixing makefile, but I'm not sure about missing patches relevant exactly to hardware acceleration.
tkaiser Posted August 8, 2018 Posted August 8, 2018 At least 'sbc-bench m' (background monitoring) should always work. And yeah, on other platforms like Armada 38x cryptodev makes a huge difference wrt %usr vs. %sys. This is without cryptodev on Armada 385: Time CPU load %cpu %sys %usr %nice %io %irq Temp 12:25:41: 1332MHz 1.00 50% 0% 49% 0% 0% 0% 74.2°C 12:25:51: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 75.1°C 12:26:01: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 76.5°C 12:26:11: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 74.2°C 12:26:21: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 74.6°C 12:26:31: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 78.0°C 12:26:41: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 77.0°C 12:26:51: 1332MHz 1.00 50% 0% 49% 0% 0% 0% 74.6°C 12:27:01: 1332MHz 1.00 50% 0% 49% 0% 0% 0% 78.5°C 12:27:11: 1332MHz 1.00 50% 0% 50% 0% 0% 0% 77.5°C Vs. with CESA/Cryptodev: Time CPU load %cpu %sys %usr %nice %io %irq Temp 08:17:40: 1600MHz 1.07 33% 33% 0% 0% 0% 0% 54.0°C 08:17:50: 1600MHz 1.14 17% 16% 0% 0% 0% 0% 54.0°C 08:18:00: 1600MHz 1.12 31% 30% 0% 0% 0% 0% 54.0°C 08:18:10: 1600MHz 1.10 22% 21% 0% 0% 0% 0% 54.0°C 08:18:20: 1600MHz 1.16 24% 23% 0% 0% 0% 0% 54.0°C 08:18:30: 1600MHz 1.13 28% 27% 0% 0% 0% 0% 54.0°C 08:18:40: 1600MHz 1.11 20% 19% 0% 0% 0% 0% 54.0°C 08:18:50: 1600MHz 1.18 31% 30% 0% 0% 0% 0% 54.0°C 08:19:00: 1600MHz 1.15 15% 15% 0% 0% 0% 0% 54.0°C 08:19:10: 1600MHz 1.12 31% 30% 0% 0% 0% 0% 53.0°C Edit: Oh man, 3rd time in a few days that the forum software ate text when inserting code blocks. I really hate the forum engine.
Recommended Posts