Jump to content

[sysbench] quick cpu perf [AMD64 Phenom2, Pine64, XU4, A20] & issue with aarch64 FriendlyARM Nexell SoC


wildcat_paris

Recommended Posts

I am a little surprised with the result of Phenom2 965 vs. Pine64 vs. XU4

time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run

 

 

 

Phenom2 965 AMD64 (4 cores) (Ubuntu Trusty kernel 4.2 x86_64 SSE2/3) => 6.8390s

 

 

gr@gr ~ $ time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          6.8390s
    total number of events:              10000
    total time taken by event execution: 54.6398
    per-request statistics:
         min:                                  2.69ms
         avg:                                  5.46ms
         max:                                 57.81ms
         approx.  95 percentile:              14.69ms

Threads fairness:
    events (avg/stddev):           1250.0000/40.40
    execution time (avg/stddev):   6.8300/0.01


real    0m6.845s
user    0m26.932s
sys     0m0.008s

gr@gr ~ $ uname -a
Linux gr 4.2.0-37-generic #43~14.04.1-Ubuntu SMP Wed May 18 17:25:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
gr@gr ~ $ cat /proc/version
Linux version 4.2.0-37-generic (buildd@lgw01-20) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #43~14.04.1-Ubuntu SMP Wed May 18 17:25:51 UTC 2016

 

 

 

Pine64 (arm64) 4 cores (Pine64 based - aarch64 NEON) => 7.9824s

 

 

ubuntu@pine64:~$ time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          7.9824s
    total number of events:              10000
    total time taken by event execution: 63.7509
    per-request statistics:
         min:                                  3.17ms
         avg:                                  6.38ms
         max:                                 33.21ms
         approx.  95 percentile:              15.68ms

Threads fairness:
    events (avg/stddev):           1250.0000/6.67
    execution time (avg/stddev):   7.9689/0.01


real    0m8.002s
user    0m31.320s
sys     0m0.000s

ubuntu@pine64:~$ uname -a
Linux pine64 3.10.101-0-pine64-longsleep #39 SMP PREEMPT Sat May 7 12:39:25 CEST 2016 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@pine64:~$ zcat /proc/config.gz | grep NEON
CONFIG_KERNEL_MODE_NEON=y
CONFIG_CRYPTO_AES_ARM64_NEON_BLK=y

 

 

 

XU4 (4+4 cores) (Armbian armv7l NEON) => 45.4731s

 

 

gr@odroidxu4:~$ time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          45.4731s
    total number of events:              10000
    total time taken by event execution: 363.5303
    per-request statistics:
         min:                                 22.21ms
         avg:                                 36.35ms
         max:                                246.75ms
         approx.  95 percentile:              55.20ms

Threads fairness:
    events (avg/stddev):           1250.0000/426.03
    execution time (avg/stddev):   45.4413/0.01


real    0m45.530s
user    5m58.405s
sys     0m0.135s

gr@odroidxu4:~$ zcat /proc/config.gz | grep NEON
CONFIG_NEON=y
gr@odroidxu4:~$ uname -a
Linux odroidxu4 3.10.101-odroidxu4 #3 SMP PREEMPT Mon May 23 22:45:55 CEST 2016 armv7l armv7l armv7l GNU/Linux

 

 

 

Lamobo-R1 A20 2 cores (Armbian custom - armv7l NEON) => 442.9593s

 

 

[gr@bpi:~] $ time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          442.9593s
    total number of events:              10000
    total time taken by event execution: 3542.8610
    per-request statistics:
         min:                                137.58ms
         avg:                                354.29ms
         max:                               1404.13ms
         approx.  95 percentile:             427.10ms

Threads fairness:
    events (avg/stddev):           1250.0000/17.83
    execution time (avg/stddev):   442.8576/0.04


real    7m22.989s
user    13m8.570s
sys     0m49.780s

[gr@bpi:~] $ zcat /proc/config.gz | grep NEON
CONFIG_NEON=y
CONFIG_KERNEL_MODE_NEON=y
CONFIG_CRYPTO_SHA1_ARM_NEON=m
[gr@bpi:~] $ uname -a
Linux bpi 4.6.2-sunxi #1 SMP Sun Jun 12 21:59:49 CEST 2016 armv7l armv7l armv7l GNU/Linux

 

 

 

any idea where I am wrong, please?

Edited by wildcat_paris
updated data with ARCH & NEON
Link to comment
Share on other sites

any idea where I am wrong, please?

 

You're using sysbench (calculating prime numbers, on some CPUs with optimized engines -- eg. NEON/ARMv8 -- on some not). Sysbench can not be used to compare different architectures (unless your job is to calculate prime numbers, then this might matter for you since you get your prime numbers in less time). Do a simple google search for

sysbench kitchen sink site:armbian.com
Link to comment
Share on other sites

i7-4790S CPU (4 cores / 8 threads) => 3.2s

This must be at least comparable to AMD  :D  :P

 

 

 

time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
total time: 3.1960s
total number of events: 10000
total time taken by event execution: 25.5333
per-request statistics:
min: 2.22ms
avg: 2.55ms
max: 27.11ms
approx. 95 percentile: 2.52ms

Threads fairness:
events (avg/stddev): 1250.0000/51.15
execution time (avg/stddev): 3.1917/0.00


real 0m3.198s
user 0m25.048s
sys 0m0.000s

 

 

Link to comment
Share on other sites


FYI. NanoPC-T3 result.



time sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          57.1916s
    total number of events:              10000
    total time taken by event execution: 457.3030
    per-request statistics:
         min:                                 45.43ms
         avg:                                 45.73ms
         max:                                166.89ms
         approx.  95 percentile:              45.73ms

Threads fairness:
    events (avg/stddev):           1250.0000/1.00
    execution time (avg/stddev):   57.1629/0.01


real    0m57.223s
user    7m36.520s
sys     0m0.068s
Link to comment
Share on other sites

 

FYI. NanoPC-T3 result.
    execution time (avg/stddev):   57.1629/0.01

 

Yes, this one is nice! The result of having an octa-core 64-bit Cortex-A53 design running an old boring 32-bit kernel not making use of ARMv8 instruction set! Just like RPi 3. Or NanoPi M3 ;)

 

How does the output of the following command looks like?

uname -a
cat /proc/version
Link to comment
Share on other sites

 

Yes, this one is nice! The result of having an octa-core 64-bit Cortex-A53 design running an old boring 32-bit kernel not making use of ARMv8 instruction set! Just like RPi 3. Or NanoPi M3 ;)

 

How does the output of the following command looks like?

uname -a
cat /proc/version

FYI tkaiser

uname -a
Linux dns02.kometch.local 3.4.39-s5p6818 #2 SMP PREEMPT Fri May 20 15:51:46 HKT 2016 armv7l GNU/Linux
cat /proc/version
Linux version 3.4.39-s5p6818 (root@jensen) (gcc version 4.9.3 (ctng-1.21.0-229g-FA) ) #2 SMP PREEMPT Fri May 20 15:51:46 HKT 2016
Edited by wildcat_paris
typo it is tkaiser, not tkai-tsar :)
Link to comment
Share on other sites

Hello Tk,

 

sysbench kitchen sink site:armbian.com

 

either google or duckduckgo give "nothing" interesting.

 

edit : updated the first post with ARCH & NEON (+Distro)

Edited by wildcat_paris
edit ARCH + NEON
Link to comment
Share on other sites

@kometchtech

 


uname -a
Linux dns02.kometch.local 3.4.39-s5p6818 #2 SMP PREEMPT Fri May 20 15:51:46 HKT 2016 armv7l GNU/Linux
ubuntu@pine64:~$ uname -a
Linux pine64 3.10.101-0-pine64-longsleep #39 SMP PREEMPT Sat May 7 12:39:25 CEST 2016 aarch64 aarch64 aarch64 GNU/Linux

=> aarch64 vs. armv7l

Link to comment
Share on other sites

@kometchtech

 

 

@wildcat_paris

 

Thanks for the Info!

Or better still you recompiled in aarch64...
Only official, there is that way.

 

 

Linux dns02.kometch.local 3.4.39-s5p6818 => armv7l

 

As  you know how to compile the kernel, you may add the aarch64 compiler to see if the kernel is working (or the recipe only work on aarch32 because of bootloader limitation???)

 

well it is Samsung based... they recently moved forward on ARM7 XU4 on kernel 4.6-4.7 beta, Samsung OpenSource can work for their customers (that is good)

 

edit 1

ubuntu@pine64:~$ dpkg -l "*gcc*" | grep arm64
ii  gcc                4:5.3.1-1ubuntu1  arm64        GNU C compiler
ii  gcc-5              5.3.1-14ubuntu2.1 arm64        GNU C compiler
ii  gcc-5-base:arm64   5.3.1-14ubuntu2.1 arm64        GCC, the GNU Compiler Collection (base package)
ii  gcc-6-base:arm64   6.0.1-0ubuntu1    arm64        GCC, the GNU Compiler Collection (base package)
ii  libgcc-5-dev:arm64 5.3.1-14ubuntu2.1 arm64        GCC support library (development files)
ii  libgcc1:arm64      1:6.0.1-0ubuntu1  arm64        GCC support library

my VM to build Armbian

gr@server1404:~$ dpkg -l "*gcc*" | grep arm64
ii  gcc-aarch64-linux-gnu                 4:4.8.2-1                               amd64        The GNU C compiler for arm64 architecture
ii  libgcc-4.8-dev-arm64-cross            4.8.4-2ubuntu1~14.04.1cross0.11.2       all          GCC support library (development files)
ii  libgcc1-arm64-cross                   1:4.8.4-2ubuntu1~14.04.1cross0.11.2     all          GCC support library

gr@server1404:~$ dpkg -l "*cpp*" | grep arm64
ii  cpp-aarch64-linux-gnu                 4:4.8.2-1                               amd64        The GNU C preprocessor (cpp) for arm64 architecture


Edited by wildcat_paris
edit #1
Link to comment
Share on other sites

@kometchtech

 

if you manage to install the cross compiler for aarch64

 

as you are using kernel 3.4.x, probably gcc 4.8 or 4.9 would be accurate (newer gcc may give you compilation error)

CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make clean defconfig
CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make -j4 Image

if you cannot find a PPA or cross compiler toolchain provided by your distro

you can still download the toolchain from linaro

https://www.linaro.org/downloads/historic/(older version)

 

But still you may face the issue if the bootloader cannot load the aarch64 kernel

Link to comment
Share on other sites

@kometchtech

 

if you manage to install the cross compiler for aarch64

 

as you are using kernel 3.4.x, probably gcc 4.8 or 4.9 would be accurate (newer gcc may give you compilation error)

CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make clean defconfig
CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make -j4 Image

if you cannot find a PPA or cross compiler toolchain provided by your distro

you can still download the toolchain from linaro

https://www.linaro.org/downloads/historic/(older version)

 

But still you may face the issue if the bootloader cannot load the aarch64 kernel

Thanks,@wildcat_paris

 

Shown below, seems to not compile.
# CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make clean defconfig
Makefile:568: /home/linux-3.4.y/arch/arm64/Makefile: No such file or directory
make[1]: *** No rule to make target `/home/linux-3.4.y/arch/arm64/Makefile'.  Stop.
make: *** [clean] Error 2
Link to comment
Share on other sites

well it is Samsung based... they recently moved forward on ARM7 XU4 on kernel 4.6-4.7 beta, Samsung OpenSource can work for their customers (that is good)

 

Nope, it is NOT Samsung (even if everybody does copy&paste and writes just that). This SoC is from Nexell instead: http://www.nexell.co.kr/chi/pro/pro04.html

 

A SoC for the chinese tablet/smartphone market: many cores, low performance, crappy software. So everything Samsung does for their SoCs is absolutely irrelevant for Nexell SoCs that will remain at a 32-bit 3.4.39 kernel forever (and still there is the claim that this is in reality a 2.6.x kernel: "Specifically, this is a Linux-3.4 kernel that looks more like a Linux-2.6.28 platform port that was forward-ported.")

Link to comment
Share on other sites

@Tk,

 

Nope, it is NOT Samsung (even if everybody does copy&paste and writes just that). This SoC is from Nexell instead: http://www.nexell.co.kr/chi/pro/pro04.html

 

A SoC for the chinese tablet/smartphone market: many cores, low performance, crappy software. So everything Samsung does for their SoCs is absolutely irrelevant for Nexell SoCs that will remain at a 32-bit 3.4.39 kernel forever (and still there is the claim that this is in reality a 2.6.x kernel: "Specifically, this is a Linux-3.4 kernel that looks more like a Linux-2.6.28 platform port that was forward-ported.")

 

ok, thank you so much.

 

I already read the comment about the "linux2.6-like porting"

 

it seems this company (nexell)  just misleads users/sellers using Samsung-like & NXP-semiconductor-like naming -- see http://www.nexell.co.kr/eng/pro/pro03.html :angry:

 

I am wondering why NXP & Samsung are not starting a lawsuit and disband their "CEO" http://www.nexell.co.kr/eng/com/com02.html :ph34r: (unless Nexell is a low-cost venture of NXP & Samsung)

Link to comment
Share on other sites

@kometchtech

 

 

So do not arm64 in the following source is apparently not compile.

Do you have a solution?
 

https://github.com/friendlyarm/linux-3.4.y

 

 

Sorry I guess there is nothing we can do, as there is no aarch64 architecture in this kernel.

 

Fortunately I ordered a FriendlyARM NanoPi M1 with the Allwinner SoC -- as a cheaper RPi2 (note: I had to ask FriendlyARM to send me the board -- 2 weeks after purchase, the 10$ shipment is fortunately "registered mail" I can easily track). Let's see how fine is the board.  :unsure:

Link to comment
Share on other sites

...has been reduced to $5 maybe due to some public poking :)

 

thanks TK that may explain the delay :) but I hope for $5 it is still "registered mail" you can track from 17track/afterpost and in also with French Post in realtime.

 

French Post (France-to-France) is so much expensive that $10 for international tracking is not so much :rolleyes:

 

edit: yes it is now $5 or €4.44

Edited by wildcat_paris
$5
Link to comment
Share on other sites

either google or duckduckgo give "nothing" interesting.

 

Post #6.

 

 but I hope for $5 it is still "registered mail" you can track from 17track/afterpost and in also with French Post in realtime.

 

What do you plan to do with the M1? Using it with a 5MP camera module? I ask since this is still the only use case or why to buy a NanoPi. To be honest: Apart from the camera module NanoPi M1 with 1GB DRAM is almost identical to Orange Pi PC but the latter is faster (due to better voltage regulator and better overall heat dissipation limiting throttling situations) and costs less even when shipping is also considered. I still feel I'm overseeing something.

 

BTW: Regarding French Post I made some (pretty bad) experiences. My daughter moved to Montpellier last year and it was always 'adventure time' sending her parcels ;)

Link to comment
Share on other sites

 

 To be honest: Apart from the camera module NanoPi M1 with 1GB DRAM is almost identical to Orange Pi PC but the latter is faster (due to better voltage regulator and better overall heat dissipation limiting throttling situations) and costs less even when shipping is also considered. I still feel I'm overseeing something.

 

In the meantime I found a 2nd use case: Using NanoPi M1 as Ethernet dongle doing fancy network stuff (for example handing this out to others to force them connecting them to their local network only through the H3 Ethernet dongle to ensure an uncompromised VPN endpoint for example)

 

NanoPi M1 could be powered reliably through an USB computer port given a short USB cable with low resistance is used and some settings are adjusted to ensure consumption never exceeds 2.5W which is possible limiting CPU clockspeed to 600 MHz for example (500mA should be provided at 5V --> 2.5W)

 

Why NanoPi M1 instead of small Oranges? Since the latter would require a little hardware tweak to power on when connected to an USB computer port through their OTG port: http://blog.atx.name/orange-pi-pc-first-impressions/

Link to comment
Share on other sites

@Tk

 

Believe me, you will buy a NanoPi M1 soon :wub:

 

The ethernet dongle is a nice idea, 

 

btw, I can use the NanoPi M1 to move the Pi-Hole DNS https://pi-hole.net/ from my Lamobo-R1 router (I am waiting my Turris Omnia)

 

*** As I wrote, I guess I will be using the M1 as a Rpi2 "replacement", maybe with realtime kernel Patch as the Armbian "lib" tool makes it possible

 

Because I have recently learned how to burn an Arduino Uno fake copy.

I have replaced my burnt board with 2x Sainsmart UNO and learnt how to avoid burning another "ONE" (projects including: battery charger with LCD and other DIY projects)

 

But with my Avalanche-based multi-module RNG generator + RNG data testing (like ent+dieharder+NIST), I need a cheaper board I can "burn" in case I make a mistake on the GPIO pins. Ok now, I have voltage converters, modern Schmitt triggers (low power, 3.3V) and know-how to avoid burning GPIO pins  :ph34r:  but in case, a spare is always useful.

 

So NanoPi M1 + Armbian looks like a good choice. RPi2 and RPi3 are *very* expensive boards. Now Farnell has been bought by a Swiss company, the Raspi prices will probably rise even more.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines