BPI-R2 Board Bring-up
5 5

44 posts in this topic

Recommended Posts

2 hours ago, malvcr said:

the MT7623 has 1 Gbps crypto engine for VPN, or 500 Mbps crypto engine suite (in case these are different things)

 

You should keep in mind that you're playing early adopter. MTK's first 'open source' MT7623 device is this BPi R2 so I would assume you need a lot of patience until such 'special' stuff is properly supported especially given that such basic stuff like cpufreq scaling still isn't working.

Share this post


Link to post
Share on other sites

You are right ... priorities.  In fact, I would prefer to have the cpufreq type of things ready than the 'special' stuff.

 

So ... today.

 

1) It is not possible to have high throughput on sequential cryptography networking involving only the CPU.

2) Plain networking (no linear CPU processing involved) is OK.

3) It is not possible to play with the CPU frequency.

4) Disk it is not tuned, and has the PCIe-line bandwidth limitation.  USB3 speed it is not good at all.

 

Then ... por specialized programming "today"

 

1) To increase "protected" networking speed, some sort of multithreading/multiprocessing must be used (at least there are 4 cores), or playing around with the cryptography algorithms to let the CPU to breath.

2) JIT processing it is not a good idea when dealing with lots of data.  Better prepare what you can in advance when the CPU it is idle.

3) As the machine hast 2 GB RAM, it is important to use that 'asset' to reduce I/O latency.

4) Don't use USB for disks.  Keep SATA and, "maybe", an extra adapter in the PCIe slot.

Share this post


Link to post
Share on other sites
4 minutes ago, malvcr said:

1) It is not possible to have high throughput on sequential cryptography networking involving only the CPU.

 

No idea if that's really true since the SoC is said to support proprietary MTK crypto extensions. So once MTK demonstrates how to use this stuff (in a reasonable way, you want this functionality available in a way that later kernel updates don't break everything) you could ask them for the output of

for i in 128 192 256; do openssl speed -elapsed -evp aes-${i}-cbc ; done

Also I don't agree on the conclusions wrt disk/USB performance since you can test for this stuff only with great performing devices (SSDs that are known to exceed 500 MB/s in every situation eg. Samsung Pro 850 with at least 256 GB -- the 128GB variant has lower write performance). Anyway: it was such a sh*t show getting any information about R2 since the 'sinovoip team bpi' guy actively blocked everything, it seems this still continues and playing early adopter is always just a great recipe to waste your own time... feel free to continue here but please don't expect further feedback :) 

Share this post


Link to post
Share on other sites
4 minutes ago, malvcr said:

could be possible that thing is ready?

cryptodev is just a "bridge" between kernel and userspace (and it needs recompiling OpenSSL with cryptodev support to be of any use, AF_ALG is a better alternative AFAIK).

To check if hardware crypto is supported you need to check the device tree, kernel config and possibly /proc/crypto

Share this post


Link to post
Share on other sites
34 minutes ago, malvcr said:

could be possible that thing is ready?

Ask Gary? :)

 

Two questions (for him):

  1. How does openssl performance look like using the standard benchmark above?
  2. How do they want to move from PoC to productive? Seems the current MTK vendor kernel does not receive frequent updates (still at 4.4.70), how should users cope with cryptodev if they need to compile the stuff themselves and out-of-tree and will BPi then provide rebuilt openssl packages to make use of the engine?

 

Share this post


Link to post
Share on other sites

I am writing to Gary ... but in the while, I built everything and the crypto driver really works.  These are the numbers:

 

Without the driver (standard openssl):

 

for i in 128 192 256; do openssl speed -elapsed -evp aes-${i}-cbc ; done

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 4140500 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1196387 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 312026 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 78846 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 9886 aes-128-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      22082.67k    25522.92k    26626.22k    26912.77k    26995.37k

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-192-cbc for 3s on 16 size blocks: 3626398 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 64 size blocks: 1028081 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 256 size blocks: ^[[A266479 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 1024 size blocks: 67186 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 8192 size blocks: 8426 aes-192-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-192-cbc      19340.79k    21932.39k    22739.54k    22932.82k    23008.60k

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 3258679 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 910552 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 235055 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 59249 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 7422 aes-256-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      17379.62k    19425.11k    20058.03k    20223.66k    20267.01k

With the driver ( to do this I had to compile kernel, cryptodrv and openssl ... with some quirks here and there ):

for i in 128 192 256; do ./openssl speed -elapsed -evp aes-${i}-cbc ; done

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 97341 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 83631 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 74013 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 73826 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 45441 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 16384 size blocks: 31972 aes-128-cbc's in 3.00s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -march=armv7-a -Wa,--noexecstack

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc        519.15k     1784.13k     6315.78k    25199.27k   124499.22k   174609.75k

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-192-cbc for 3s on 16 size blocks: 96073 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 64 size blocks: 84094 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 256 size blocks: 74714 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 1024 size blocks: 74362 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 8192 size blocks: 43467 aes-192-cbc's in 3.00s
Doing aes-192-cbc for 3s on 16384 size blocks: 29992 aes-192-cbc's in 3.00s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -march=armv7-a -Wa,--noexecstack

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-192-cbc        512.39k     1794.01k     6375.59k    25382.23k   118693.89k   163796.31k

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 95306 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 84143 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 74296 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 73367 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 41361 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 28101 aes-256-cbc's in 3.00s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -march=armv7-a -Wa,--noexecstack

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc        508.30k     1795.05k     6339.93k    25042.60k   112943.10k   153468.93k

So ... with the R2 I "must" use the hardware help.

I will check AF_ALG ... but today hours were not enough ;-)

 

Share this post


Link to post
Share on other sites

Detail .... that hardware engine seems to work better with more than 1024 bytes.   Something to take into consideration (although I need to check openssl compiling parameters with more care).

 

Share this post


Link to post
Share on other sites
4 hours ago, malvcr said:

that hardware engine seems to work better with more than 1024 bytes

Well, obviously the overhead trashes performance with small chunks. When comparing the numbers with another Cortex-A7 then it's again an indication that the MTK SoC is clocked with just 1040 MHz (I would assume it's 1042). I added the numbers to our list: https://forum.armbian.com/index.php?/topic/4583-rock64/&do=findComment&comment=37829

 

Share this post


Link to post
Share on other sites

I checked interrupts, by Rider.Lee in BPI forum recommendation ... and when using the extensions there is a big quantity of them ( openssl speed -evp aes-128-cbc ).

First run (with cryptodev) 
Total change for mtk-aes: 394682
Total change CPU : 2260 + 14616 + 14895 + 22625 = 54396

Second run (without cryptodev) 
Total change for mtk-aes: 0 
Total change CPU : 1710 + 4332 + 4860 + 4357 = 15259

Right now I repeated the test checking the CPU.

 

In the first case the CPU arrived at most to 50% - with cryptodev

In the second case went to 100% (one core) - without cryptodev

 

Would be interesting to go directly to the kernel interface.  I don't know if openssl or even cryptodev way to do things could be wasting some CPU.

 

My next test would be to replace the basic infrastructure to see how is the behavior with sftp.  Previously I only was able to obtain 14 MB/s but this was limited by the CPU.

Share this post


Link to post
Share on other sites
2 hours ago, malvcr said:

My next test would be to replace the basic infrastructure to see how is the behavior with sftp.  Previously I only was able to obtain 14 MB/s but this was limited by the CPU.

For this you may need to recompile some stuff. Even in Debian Stretch openssh-server still is compiled against OpenSSL 1.0.x, and hardware acceleration (lke cryptodev or AF_ALG) AFAIK requires OpenSSL 1.1 or newer. Not sure if you will need any special configuration to make libssl use HW crypto with different applications by default.

Share this post


Link to post
Share on other sites

Not so easy  to do ...

 

Latest ssh version (cloned with git from official site):

 

checking OpenSSL library version... configure: error: OpenSSL >= 1.1.0 is not yet supported (have "10101000 (OpenSSL 1.1.1-dev  xx XXX xxxx)").  And, of course, when I replace the OpenSSL many things are broken.

 

 

I will need to figure a different approach.  There is a Fedora patch for 1.0, but I think that to make that to work involves more work than to do something from scratch (and will be over when openssh be updated to openssl 1.1.0).

Share this post


Link to post
Share on other sites

I have been thinking in some combination as OpenVPN (recompiled for cryptodev) with rsync or a type of simple plain data transmission.  What I don't know is how will be the overhead ... if it is very high then I will need to process big chunks of data by myself.  But, in general, it is a good thing to test :-)  

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

5 5

  • Support the project

    We need your help to stay focused on the project.

    Choose the amount and currency you would like to donate in below.