1 1
tkaiser

sbc-bench

Recommended Posts

On 10/5/2018 at 6:07 AM, malvcr said:

This are numbers without the "elapsed" parameter.
 


Without AFALG

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      19776.55k    23565.55k    24981.25k    25360.04k    25556.85k    25471.66k

With AFALG

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc       8008.18k    27638.99k   251372.80k  1167974.40k         infk  4513792.00k

 

I am not sure ... but it seems that afalg it is not available in the stock Armbian ( I checked this with a supported M2+).  With SBC seems important to have this available to use the machines potential.

 

ALG and Cryptp Blocks can be a bit complicated...

 

Based on your numbers - what moves more bits?

 

ARM does...

SOFTWARE

Doing aes-256-cbc for 3s on 16 size blocks: 3708058 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1104719 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 292752 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 74300 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 9329 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 4662 aes-256-cbc's in 3.00s

ALG

Doing aes-256-cbc for 3s on 16 size blocks: 129499 aes-256-cbc's in 2.95s
Doing aes-256-cbc for 3s on 64 size blocks: 115145 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 78540 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 34189 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 5404 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 2756 aes-256-cbc's in 3.00s

 

That being said, ALG is likely lower CPU usage overall, but running on ARM in your case is the right choice....

 

 

 

Share this post


Link to post
Share on other sites
On 10/5/2018 at 3:07 PM, malvcr said:

I am configuring another BPI-R2 machine, and I was checking the benchmarks.  For this I am using a 4.14.71 with Ubuntu.  The numbers are not better than my previous attempts

did you ever tried cryptodev for the R2. I think you had to compile openssl as well (as far as I now, there're some issues with openssls cryptodev implementation, but honestly never cared). Just out of curiosity.. :P I only gave it one shot back then cause the board had bigger issues than crypto in those days.

 

 

Share this post


Link to post
Share on other sites
Quote

did you ever tried cryptodev for the R2. I think you had to compile openssl as well (as far as I now, there're some issues with openssls cryptodev implementation, but honestly never cared). Just out of curiosity..  I only gave it one shot back then cause the board had bigger issues than crypto in those days.

 

Interesting ... I though my reply was not stored or deleted.  Anyway ...

 

I was checking with cryptodev and the BPI 4.4.70 some time ago (I have some posts in the BPI forum about R2).  It was faster, but what I didn't like about it is that cryptodev it is not accepted as an official kernel module.  This is why I was working with AF_ALG.  Also, although openssl works, it is a very big piece of code that could hide some "issues" for my security centered work (this already happened with openssl).  This is also the reason I am not continuing using BOTAN, a very complete ciphering platform, and I prefer something embedded in the Kernel or my own small and light framework.

 

In fact I am not 100% satisfied with AF_ALG.  It is a very artificial method (with a terrible documentation) and it doesn't work very well for small block sizes.  I was trying to mimic the openssl "speed" benchmark while encrypting with very big block sizes, but the "encript" openssl option  doesn't permit me to work them.  In such case, I prefer not to use openssl and to work my own tests with my code.

 

Let me see if I can have a minimum AF_ALG testing and basic ciphering tool to share with clear enough source code to play with.

Share this post


Link to post
Share on other sites

@tkaiser

Hi Thomas. I'm planning to make a video about the use/uselessness/problems of benchmarking SBC's.

Today I got a message from a subscriber about the Odroid H2, where he claimed the XU4 was a very slow SBC. His claim was backed up by "benchmarks" from ExplainingComputers. Here are those results.
afbeelding.png.5e0a3f2d9112c091762fa05ac76d8d42.png

Video ExplainingComputers :  Six SBC Benchmark: ODROID XU4, ROCKPro64 & More!

Here the ROCK64 seems to outperform the XU4. We all know that's not right.

I would use SBC-bench if ok for you and Blender to show the difference in results using different platforms, kernels and settings on the same SBC. I think with the M4. Lubuntu xenial armhf vs Lubuntu bionic armhf vs Lubuntu bionic arm64 and Armbian stretch vs bionic. Those differences are big.
And the Raspberry 3B+ with ram overclock and without to show importance of ram+cpu.

I would make different subjects. Of which :

 

Problems with Benchmarks and sollutions

SBC-Bench (how does it work, what does it do)
Differences between different cortexes A7, A53, A72, ... (I'll need to do a lot of homework for that, if you could elaborate on it, please do)

Importance of RAM speed with CPU speed, and other parameters

Cheating manufacturers (Amlogic with C2, Raspberry Pi with 3B+, any others I should mention?)

Conclusion...

So with this I ask your permission to use SBC-bench, and quote you out the readme.md and eplanations and insights in the results.md file. And if you would like me to mention something, please tell me. Or if you want you could record an audio/video file with your words to add in the video(just a thought)

Did you start a draft for the "Interpreting results" part yet?

I'll be busy for at least another week gathering information. When done I'll share my results, and I''ll say what I'm going to use from your texts in the video.
Sorry for the long post.
Greetings.
 

Share this post


Link to post
Share on other sites
2 hours ago, NicoD said:

Here the ROCK64 seems to outperform the XU4. We all know that's not right.

I would use SBC-bench if ok for you and Blender to show the difference in results using different platforms, kernels and settings on the same SBC. I think with the M4. Lubuntu xenial armhf vs Lubuntu bionic armhf vs Lubuntu bionic arm64 and Armbian stretch vs bionic. Those differences are big.
And the Raspberry 3B+ with ram overclock and without to show importance of ram+cpu.

I would make different subjects. Of which :

 

Problems with Benchmarks and sollutions

SBC-Bench (how does it work, what does it do)
Differences between different cortexes A7, A53, A72, ... (I'll need to do a lot of homework for that, if you could elaborate on it, please do)

Importance of RAM speed with CPU speed, and other parameters

Cheating manufacturers (Amlogic with C2, Raspberry Pi with 3B+, any others I should mention?)

 

Gah - watched the video - and a lot of problems across the board (pardon the pun).

 

Different kernels, built with different versions of GCC, userland (for example, Raspbian userland is all ARMv6 with exception of the kernel for the A7/A53 boards)....

 

(I wouldn't have included the any of the Pi's in the set of boards being evaluated because of the userland - <soapbox> nothing against Pi's in general, one must appreciate that 35M+ boards means they're doing something right, and they've spawned an entire HW/SW ecosystem around their platform, that's ok - and that ecosystem has in turn made affordable ARM boards available for hobbyists, makers, and developers - before Pi, if one wanted to do development around ARM, boards were expensive, and SW support was very limited to the vendor BSP - these days, it's a lot more open - not perfect, but much better than it was</soapbox>)

 

Rock64 vs Odroid XU4 - Quad A53 vs A7/A15 big.LITTLE - the big.LITTLE is a challenge for the scheduler, and depending on the BSP from the OEM, it's easy to get wrong, where threads can land on the lesser preferred core, this is an issue even on Android, where much work has been done outside of the mainline kernels (ARM and Qualcomm, I know they've done a lot of research there, but much of that has not been pushed back to mainline).

 

In my experience, with supported boards (for me this is Tinker and NanoPi NEO), Armbian is generally faster than the vendor's images - and that's doing Byte-Unixbench, which is discounted because it is compiler sensitive - that being said, it's still a useful tool when comparing apples to apples (e.g. tweaking settings on the same OS/Platform, but comparing Platform A to Platform B, one has to take the results with a grain of salt)

 

I haven't found a lot of evidence of cheating by any of the SBC vendors - it's really hard to do with FOSS, compared to Android, where cheating has occurred with certain OEM's and specific benchmark APK's - Android has enough hooks to enable this kind of cheating in any event.

 

sbc-bench, in my humble opinion, is a good benchmark for supported boards - as long as the boards being compared are all on the same version of Armbian - and this is made clear in the script comments (please review the script on github, and @tkaiser has been pushing updates, so if one has cloned the repo, it's worthwhile to do a git pull to get the latest revision.

 

To answer your question about the different versions of Cortex...

 

Small Cores - A7, A53 are the low power cores focused on efficiency

Big Cores - A15, A12(A17), A72 - big cores... 

 

Think of it like Atom (Small Core) vs Core i3/i5/i7 (Big Core) - even at the same clock, the big core is going to get more work done, but perhaps at the cost of heat, so thermal solution needs to be considered.

Share this post


Link to post
Share on other sites
6 hours ago, NicoD said:

o with this I ask your permission to use SBC-bench, and quote you out the readme.md and eplanations and insights in the results.md file. And if you would like me to mention something, please tell me. Or if you want you could record an audio/video file with your words to add in the video(just a thought)

Did you start a draft for the "Interpreting results" part yet?

 

Up to @tkaiser for results on sbc-bench...

 

working on an addition - byte-unixbench and sorting out things... removing some gcc over optimizations, looking at threads...

 

https://github.com/sfx2000/byte-unixbench

 

It's a better bench than sysbench, and portable... Doing a -c 1 -1 and -c4 -i 1  keeps things short - however - letting it run thru pushes heat/throttles...

 

UnixBench is interesting from a system perspective...

 

RPI3 B Plus vs Tinker....

 

Tinker is 15 pounds of power in a 5 pound sack - RPi3 B+ is a CPU that can do better that it is with raspbian....

 

Tinkerboard - Cortex-A12/A17
------------------------------------------------------------------------
Benchmark Run: Sat Oct 20 2018 17:02:37 - 17:31:22
4 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8709974.2    746.4
Double-Precision Whetstone                       55.0       1031.4    187.5
Execl Throughput                                 43.0       1095.7    254.8
File Copy 1024 bufsize 2000 maxblocks          3960.0      91960.7    232.2
File Copy 256 bufsize 500 maxblocks            1655.0      26583.4    160.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     246267.0    424.6
Pipe Throughput                               12440.0     149851.8    120.5
Pipe-based Context Switching                   4000.0      25850.9     64.6
Process Creation                                126.0       2429.0    192.8
Shell Scripts (1 concurrent)                     42.4       2061.9    486.3
Shell Scripts (8 concurrent)                      6.0        432.0    720.1
System Call Overhead                          15000.0     442992.8    295.3
                                                                   ========
System Benchmarks Index Score                                         258.2

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   13538575.0   1160.1
Double-Precision Whetstone                       55.0       1982.4    360.4
Execl Throughput                                 43.0       1752.7    407.6
File Copy 1024 bufsize 2000 maxblocks          3960.0      87122.4    220.0
File Copy 256 bufsize 500 maxblocks            1655.0      22948.6    138.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     281302.7    485.0
Pipe Throughput                               12440.0     321233.1    258.2
Pipe-based Context Switching                   4000.0      40012.9    100.0
Process Creation                                126.0       3820.3    303.2
Shell Scripts (1 concurrent)                     42.4       3399.0    801.7
Shell Scripts (8 concurrent)                      6.0        433.6    722.7
System Call Overhead                          15000.0     952658.0    635.1
                                                                   ========
System Benchmarks Index Score                                         373.1

 

Rpi 3B+ - Cortex-A53 - VCOS/ThreadX
------------------------------------------------------------------------
Benchmark Run: Sat Oct 20 2018 17:02:32 - 17:30:38
4 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    4324740.1    370.6
Double-Precision Whetstone                       55.0        957.4    174.1
Execl Throughput                                 43.0        908.8    211.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     140312.9    354.3
File Copy 256 bufsize 500 maxblocks            1655.0      40618.4    245.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     353296.2    609.1
Pipe Throughput                               12440.0     280908.2    225.8
Pipe-based Context Switching                   4000.0      50734.2    126.8
Process Creation                                126.0       2212.2    175.6
Shell Scripts (1 concurrent)                     42.4       1780.5    419.9
Shell Scripts (8 concurrent)                      6.0        575.7    959.5
System Call Overhead                          15000.0     594784.0    396.5
                                                                   ========
System Benchmarks Index Score                                         302.2

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   17082008.4   1463.8
Double-Precision Whetstone                       55.0       3803.4    691.5
Execl Throughput                                 43.0       2240.8    521.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     228921.9    578.1
File Copy 256 bufsize 500 maxblocks            1655.0      62777.0    379.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     578721.9    997.8
Pipe Throughput                               12440.0    1112342.2    894.2
Pipe-based Context Switching                   4000.0      98478.8    246.2
Process Creation                                126.0       4789.7    380.1
Shell Scripts (1 concurrent)                     42.4       4464.7   1053.0
Shell Scripts (8 concurrent)                      6.0        589.0    981.7
System Call Overhead                          15000.0    2289227.2   1526.2
                                                                   ========
System Benchmarks Index Score                                         705.6

Share this post


Link to post
Share on other sites
1 1