malvcr

Members
  • Content count

    58
  • Joined

  • Last visited

About malvcr

  • Rank
    Advanced Member

Recent Profile Visitors

190 profile views
  1. BPI-R2 Board Bring-up

    It is good to know that the R2 it is being taken seriously. This machine has important things and, although there are very good alternatives, it has a market place. I have been sending maybe 100 times by now an 825 megabytes ubuntu iso file from one R2 to another and between an R2 and a Mac Mini machine, testing different types of configurations (in the while I am creating the system I will use the R2 for). Here there have some numbers that could be useful: (AES 256 bit): without /dev/crypto 100% CPU max - real 0m53.189s - user 0m44.210s - sys 0m8.030s with /dev/crypto 75% CPU max - real 0m27.015s - user 0m2.290s - sys 0m17.750s Checking at this test alone, it is clear that with the cryptodev driver active (and with the right openssl compiled for it), the machine it is faster processing. And then it is the top 100% capacity when using only the CPU ... I was trying to figure how to test that remaining 25% ... so, I made a multithread program that received the data (running in a R2), and executed 4 parallel sets of openssl+sending data from the other R2. The "general "throughput" for all the "bundle" gives around 45.8 MB/s. This is much higher than the around 17 MB/s I can have with only one similar session. The issue here is that the final speed can't be calculated only taking into consideration the crypto engine. A final test would need software designed for this, because when I cypher with openssl and then send the file on ethernet, I need to "write" the file to disk and to re-read it, and the DISK is a key factor on the overall transmission speed. An extra write is really heavy here. So, if I like to see a wonderful speed without sacrificing the machine, the disk must speed up. The final numbers for secure transmission of data must involve all the key factors : CRYPTO+DISK+NET+CPU. But ... in general, I think it is good enough for my purposes. When I have a better software platform to test all together (without punishing any of the factors), I will come to show my numbers.
  2. BPI-R2 Board Bring-up

    Making some tests ... rsync uses ssh or rsh (that for some reason it is worse) ... openssl with cryptodev it is fast on big files. There is no rcp (was replaced by scp that is part of ssh set). Then ssh it is in many important places in a regular distribution. I still must make more precise tests, but sending a file with ssh was around 3 minutes ... but encrypting it with openssl (aes - 256) took around 37 seconds and transmitting it with a simple perl script around 12 seconds. I know I can do better by saving one disk reading in the processing phase by moving to C or C++. But this shows that it is possible to obtain more from this machine without using the "normal" options.
  3. BPI-R2 Board Bring-up

    I have been thinking in some combination as OpenVPN (recompiled for cryptodev) with rsync or a type of simple plain data transmission. What I don't know is how will be the overhead ... if it is very high then I will need to process big chunks of data by myself. But, in general, it is a good thing to test :-)
  4. BPI-R2 Board Bring-up

    Not so easy to do ... Latest ssh version (cloned with git from official site): checking OpenSSL library version... configure: error: OpenSSL >= 1.1.0 is not yet supported (have "10101000 (OpenSSL 1.1.1-dev xx XXX xxxx)"). And, of course, when I replace the OpenSSL many things are broken. I will need to figure a different approach. There is a Fedora patch for 1.0, but I think that to make that to work involves more work than to do something from scratch (and will be over when openssh be updated to openssl 1.1.0).
  5. BPI-R2 Board Bring-up

    I checked interrupts, by Rider.Lee in BPI forum recommendation ... and when using the extensions there is a big quantity of them ( openssl speed -evp aes-128-cbc ). First run (with cryptodev) Total change for mtk-aes: 394682 Total change CPU : 2260 + 14616 + 14895 + 22625 = 54396 Second run (without cryptodev) Total change for mtk-aes: 0 Total change CPU : 1710 + 4332 + 4860 + 4357 = 15259 Right now I repeated the test checking the CPU. In the first case the CPU arrived at most to 50% - with cryptodev In the second case went to 100% (one core) - without cryptodev Would be interesting to go directly to the kernel interface. I don't know if openssl or even cryptodev way to do things could be wasting some CPU. My next test would be to replace the basic infrastructure to see how is the behavior with sftp. Previously I only was able to obtain 14 MB/s but this was limited by the CPU.
  6. BPI-R2 Board Bring-up

    Detail .... that hardware engine seems to work better with more than 1024 bytes. Something to take into consideration (although I need to check openssl compiling parameters with more care).
  7. BPI-R2 Board Bring-up

    I am writing to Gary ... but in the while, I built everything and the crypto driver really works. These are the numbers: Without the driver (standard openssl): for i in 128 192 256; do openssl speed -elapsed -evp aes-${i}-cbc ; done You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 4140500 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 1196387 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 312026 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 78846 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 9886 aes-128-cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 22082.67k 25522.92k 26626.22k 26912.77k 26995.37k You have chosen to measure elapsed time instead of user CPU time. Doing aes-192-cbc for 3s on 16 size blocks: 3626398 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 64 size blocks: 1028081 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 256 size blocks: ^[[A266479 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 1024 size blocks: 67186 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 8192 size blocks: 8426 aes-192-cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-192-cbc 19340.79k 21932.39k 22739.54k 22932.82k 23008.60k You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 3258679 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 910552 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 235055 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 59249 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 7422 aes-256-cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 17379.62k 19425.11k 20058.03k 20223.66k 20267.01k With the driver ( to do this I had to compile kernel, cryptodrv and openssl ... with some quirks here and there ): for i in 128 192 256; do ./openssl speed -elapsed -evp aes-${i}-cbc ; done You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 97341 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 83631 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 74013 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 73826 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 45441 aes-128-cbc's in 2.99s Doing aes-128-cbc for 3s on 16384 size blocks: 31972 aes-128-cbc's in 3.00s OpenSSL 1.1.1-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -march=armv7-a -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 519.15k 1784.13k 6315.78k 25199.27k 124499.22k 174609.75k You have chosen to measure elapsed time instead of user CPU time. Doing aes-192-cbc for 3s on 16 size blocks: 96073 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 64 size blocks: 84094 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 256 size blocks: 74714 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 1024 size blocks: 74362 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 8192 size blocks: 43467 aes-192-cbc's in 3.00s Doing aes-192-cbc for 3s on 16384 size blocks: 29992 aes-192-cbc's in 3.00s OpenSSL 1.1.1-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -march=armv7-a -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-192-cbc 512.39k 1794.01k 6375.59k 25382.23k 118693.89k 163796.31k You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 95306 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 84143 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 74296 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 73367 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 41361 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 16384 size blocks: 28101 aes-256-cbc's in 3.00s OpenSSL 1.1.1-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -march=armv7-a -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 508.30k 1795.05k 6339.93k 25042.60k 112943.10k 153468.93k So ... with the R2 I "must" use the hardware help. I will check AF_ALG ... but today hours were not enough ;-)
  8. BPI-R2 Board Bring-up

    I will give a try to what Gary Wang described here (http://forum.banana-pi.org/t/is-it-possible-to-have-the-crypto-extensions-working/4034). ... could be possible that thing is ready? In case I have good news I will share them :-)
  9. BPI-R2 Board Bring-up

    You are right ... priorities. In fact, I would prefer to have the cpufreq type of things ready than the 'special' stuff. So ... today. 1) It is not possible to have high throughput on sequential cryptography networking involving only the CPU. 2) Plain networking (no linear CPU processing involved) is OK. 3) It is not possible to play with the CPU frequency. 4) Disk it is not tuned, and has the PCIe-line bandwidth limitation. USB3 speed it is not good at all. Then ... por specialized programming "today" 1) To increase "protected" networking speed, some sort of multithreading/multiprocessing must be used (at least there are 4 cores), or playing around with the cryptography algorithms to let the CPU to breath. 2) JIT processing it is not a good idea when dealing with lots of data. Better prepare what you can in advance when the CPU it is idle. 3) As the machine hast 2 GB RAM, it is important to use that 'asset' to reduce I/O latency. 4) Don't use USB for disks. Keep SATA and, "maybe", an extra adapter in the PCIe slot.
  10. BPI-R2 Board Bring-up

    Yes, the problem with my networking test was a bottleneck with the CPU, not the networking equipment. Making a simply stupid client/server perl based test (only sending static memory data from one side without verifications), I was able to obtain 135 MB/s. Of course, this number the only shows is that the device it is really a gigabit one; doesn't indicate that in real life scenarios I can obtain that throughput. But if I use it, I must understand all its limitations and possibilities. Thanks for the crypto references; I still need to do many things and to organize many "possibilities", so that material it is really important for me. So if I use the R2, can't be trying to obtain the highest throughput. Checking around ... the MT7623 has 1 Gbps crypto engine for VPN, or 500 Mbps crypto engine suite (in case these are different things). What is that and how to use it? .... no idea. I sent another message asking if this could be available for the R2 (again, this is a message for Mediatek, because could be a license issue). I was making my own secure communications protocol. I will tune it for SBC machines with all this information at hand, as I see that SSH is not so well suited for these environments.
  11. BPI-R2 Board Bring-up

    As I am developing for these platforms, that GCC version "detail" it is extremely important. Also, when using sftp it is not possible to separate the CPU throughput from the networked one, as the information must be cyphered. And there is the particular issue with the machine used. When we develop an application for everybody (as the ones in the Ubuntu distribution), it is not possible to design "for the platform"; this is particularly true when using standard sources that must be multi-platform capable. But if I am working "for the platform", then I can define particular capabilities to improve the final result even having constraints. Right now I am not at home. This night I will try the raw networking capacity ... in fact that Ubuntu has some useless things as a Docker infrastructure applied that could be interfering. Let me arrive with scissors and a broom to let it as naked as possible. About the BGI forum. My intention is to catch somebody from MTK. Also, there is something interesting: there are two forums there, one in English and a different one in Chinese (it is not a translation, it is a different forum). I will ask my wife (she is Chinese), to see what is written there... Google Translator it is not so good with that language.
  12. BPI-R2 Board Bring-up

    0.4.12 in all them ... although (of course), I have no idea how they compiled the program for each platform, in particular for the R2. Something I found half hour ago. When sending a file with sftp, the top transference speed is 14 MB/s. This is not a conclusive test, I need to make it more precise and with the right tools, but this seems to be Fast Ethernet speed, not Gigabit. Just in case, I am using two R2 machines, with both sides using RAM disks and the machines have a 1 feet cable between them (no extra equipment involved). I also included a question in the BPI forum to see if there is more information about this subject.
  13. BPI-R2 Board Bring-up

    sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 Maximum prime number checked in CPU test: 20000 Test execution summary: total time: 135.3196s total number of events: 10000 total time taken by event execution: 541.1813 per-request statistics: min: 53.93ms avg: 54.12ms max: 127.76ms approx. 95 percentile: 54.71ms Threads fairness: events (avg/stddev): 2500.0000/1.87 execution time (avg/stddev): 135.2953/0.02 Just for comparison, I made the same test in an OPI-Zero (Armbian - kernel 4.13.5) and a RPI-3 (Raspbian - kernel 4.9.35). OPI cpuinfo_max_freq : 1200000 RPI cpuinfo_max_freq : 1200000 R2 -- no data -- Mediatek information specifies 1.3 GHz Total Taken by ------- Per-Req-Statistics ------ ------------ Threads fairness ------------ Event Exec Min Avg Max Approx(95p) Events(avg/stddev) Exec time (avg/stddev) OPI 121.8403 487.2788 46.72 48.73 237.59 54.89 2500.0000/1.73 121.8197/0.01 RPI 128.1032 512.2482 47.69 51.22 145.66 60.57 2500.0000/3.67 128.0620/0.01 R2 135.3196 541.1813 53.93 54.12 127.76 54.71 2500.0000/1.87 135.2953/0.02 Thanks for the SERDES and PCIe clarification.
  14. BPI-R2 Board Bring-up

    I am very comfortable with Armbian ... a very nice distribution, congratulations to so splendid team ... and with Raspbian for the Raspberry Pi world (very closed, but it just works). I didn't like to touch LEDE because I feel that it is not oriented to what I need, but I will use some hours to see that by myself. If somebody ask me today what is the R2, I will say that it is a router platform with storage and processing capabilities. It is not a storage oriented device with many ethernet ports, neither a super computer, and if I understand that and that definition fits my needs, then it is OK. Anyway, there is that ClearFog platform, and the others referenced here. I really was not paying attention to all possible options, although I have not really enough time for researching so much because I need to design and to create things in short time, but it is important to have a clear vision about what is the current SBC landscape, "every month". In the PC world something similar happens, but the base it is very stable and the operating systems are very uniform; however, in the ARM side things are very wild (and this is refreshing, but complicated to manage). I agree ... R2 can improve, but it is not possible to do what Orange do with their machines (make one image and forget the machine). It is important to improve and improve and improve the software base. If I can help discovering problems while I assemble solutions, then I will discuss them the most my duties permits me to do it. And also, that board must evolve (not the radical change from R1 to R2 ... maybe an evolutive one from R2 to R3 some day). Manuals are not written in stone.
  15. BPI-R2 Board Bring-up

    A38X ClearFog I think I understand now this board. - It is not an integrated board, but a "carrier" based one. You purchase the carrier board and the SOM (System on Module) to make it to works as you expect. It uses only one line for mSATA interface, the same as the R2. Then, why the difference in speed? It is necessary to add a mSATA to SATA adaptor to connect standard SATA drives, or to use mSATA drives. And uBoot must be modified to allow the PCIe slot to work as mSATA. - SolidRun Armada SOM A388 with eMMC : $69 - With ClearFog Base Carrier $129 - With ClearFog Pro Carrier $189 HummingBoard Edge Similar scenario than ClearFog (using SOM). - Only has 1 ethernet - Quad 1 GHz NXP i.MX6 version 2GB RAM and 8GB eMMC: $191 - Needs M.2 to SATA adapter - No USB 3.0 (only 2.0) For a multi-ethernet scenario with storage, the HummingBoard is limited by the lack of native multiple RJ45 and USB 2.0 (where the second ethernet could be attached). When only needing two ethernet, the ClearFog with Base Carrier seems to be enough. Cost is around 50% higher than the R2. And the Pro around 100% higher. ROCK64 ... not yet available for purchase (ships until November 3 if purchased in October 16 - Pine has their history on delays) - $60.89 without shipping ( 2GB version + 16 GB eMMC + USB-SATA cable ) ~ still lacks a secondary ethernet. - It only has one USB3 port, so the bandwidth must be shared between SATA and any secondary ethernet. ExpressoBin - $49 in amazon - Dual Core - Three Gigabit Ethernet ports (1 WAN, 2 LAN) - Independent SATA interface with its own Power Supply - Proper 12V barrel power connector - Mini PCIe - Has the place to add the eMMC but it seems must be soldered there -- R2 still has a place in its price/availability/performance ratio. If there is an ExpressoBin with eMMC included, Quad Core and 2GB, maybe it could cost around $80. In that case, would be a better option than the R2. The ROCK64 lacks interfaces to provide good bandwidth when the multi-ethernet scenario is included. It is not possible to have HummingBoard with multiple Gigabit Ethernet connections. The Armada SOM is Dual core ... could be possible to use the HummingBoard SOM with the ClearFog carrier? The main important elements are to determine if an improvement in the software can work the problems detected with the R2 for the full performance capacity. In my particular case, although performance has some importance, it is not the main driver to choose one or another product. The processing unit is more important (hence 4 cores would be better than 2), together with the integrated eMMC (I can't deal with soldering these tiny things) and the availability for connecting devices. Today all them are not so perfect (taking all factors into consideration) options. I am sure than in 6 months this will be a completely different world.