Jump to content

Benchmarking CPUs


pbies

Recommended Posts

On the XU4 I had this. Just want to show you. Others don't have any problem.
 

odroid@gamestation-turbo:~$ sudo /usr/local/bin/sbc-bench.sh
[sudo] password for odroid:
WARNING: this tool is meant to run only on Debian Stretch or Ubuntu Bionic.

Installing needed tools. This may take some time...wc: /sys/devices/platform/exynos5-devfreq-int/devfreq/exynos5-devfreq-int/time_in_state: Is a directory
head: error reading ‘/sys/devices/platform/exynos5-devfreq-int/devfreq/exynos5-devfreq-int/time_in_state’: Is a directory
wc: /sys/devices/platform/exynos5-devfreq-mif/devfreq/exynos5-devfreq-mif/time_in_state: Is a directory
head: error reading ‘/sys/devices/platform/exynos5-devfreq-mif/devfreq/exynos5-devfreq-mif/time_in_state’: Is a directory
Done.
Executing tinymembench. This will take a long time... Done.
Executing 7-zip benchmark. This will take a long time.../usr/local/bin/sbc-bench.sh: line 272: 28257 Killed taskset -c 0 "${SevenZip}" b >> ${TempLog}
/usr/local/bin/sbc-bench.sh: line 272: 3463 Killed taskset -c $(( ${CPUCores} - 1 )) "${SevenZip}" b >> ${TempLog}
/usr/local/bin/sbc-bench.sh: line 272: 10070 Killed "${SevenZip}" b >> ${TempLog}
/usr/local/bin/sbc-bench.sh: line 272: 12363 Killed "${SevenZip}" b >> ${TempLog}
Done.
Executing OpenSSL benchmark. This will take a long time... Done.

ATTENTION: Throttling occured on CPU cluster 11800000. Check the uploaded log for details.

ATTENTION: Throttling occured on CPU cluster 0. Check the uploaded log for details.

ATTENTION: Throttling occured on CPU cluster 4. Check the uploaded log for details.

wc: /sys/devices/platform/exynos5-devfreq-int/devfreq/exynos5-devfreq-int/time_in_state: Is a directory
head: error reading ‘/sys/devices/platform/exynos5-devfreq-int/devfreq/exynos5-devfreq-int/time_in_state’: Is a directory
wc: /sys/devices/platform/exynos5-devfreq-mif/devfreq/exynos5-devfreq-mif/time_in_state: Is a directory
head: error reading ‘/sys/devices/platform/exynos5-devfreq-mif/devfreq/exynos5-devfreq-mif/time_in_state’: Is a directory
Memory performance (big.LITTLE cores measured individually):
memcpy: 339.7 MB/s (3.3%)
memset: 755.6 MB/s (2.8%)
memcpy: 2203.7 MB/s
memset: 4796.8 MB/s (3.5%)

7-zip total scores (three consecutive runs): 6733

OpenSSL results (big.LITTLE cores measured individually):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 26810.76k 32876.07k 34800.21k 35271.00k 35154.60k
aes-128-cbc 74115.00k 84873.98k 89295.36k 90379.61k 90753.71k
aes-192-cbc 23935.85k 28036.29k 29783.47k 30124.71k 30283.09k
aes-192-cbc 65578.05k 73527.42k 76904.36k 77431.81k 78047.91k
aes-256-cbc 21616.54k 25164.07k 26276.86k 26534.23k 26656.77k
aes-256-cbc 58102.18k 64692.16k 67346.69k 67993.26k 68192.94k

Full results uploaded to http://ix.io/1ixL. Please check the log for anomalies (e.g. swapping
or throttling happenend) and otherwise share this URL.

 

Link to comment
Share on other sites

I fixed the XU4 warnings and added cpuminer support (most demanding benchmark since making use of heavy NEON optimizations). Please only use the new version from Github any more:

wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
/bin/bash ./sbc-bench.sh neon

or

wget -O /usr/local/bin/sbc-bench.sh https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
chmod 755 /usr/local/bin/sbc-bench.sh
/usr/local/bin/sbc-bench.sh neon

I'll comment on your results soon, there's quite some interesting stuff contained (e.g. your FriendlyELEC image being armhf instead of arm64 and so on)

Link to comment
Share on other sites

On 7/27/2018 at 8:42 PM, NicoD said:

My results so far

 

I used some of your results over here: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md

 

But the numbers are partially problematic since distros were too old (only Debian Stretch or Ubuntu Bionic provide a reliable environment), sometimes strange combinations occured (e.g. FriendlyELEC's image for NanoPC T3+ being armhf and not arm64) and kernels were rather old too.

 

In case time permits it would be great if you check out most recent v0.4 version of sbc-bench and then try it on the following boards with the following images:

Repeating the tests on RPi 3 B+ would also be great (one time with fan, one time without and one time with a crappy Micro USB cable to show undervoltage with ThreadX clocking down to 600 MHz)

Link to comment
Share on other sites

2 minutes ago, tkaiser said:

n case time permits it would be great if you check out most recent v0.4 version of sbc-bench and then try it on the following boards with the following images:

Ok. I'm on it. Now doing the Rasp 3B+ with the new script and no cooling. After that I'll do the same with a crappy psu.

Indeed on some I use old distro's. I'll do it again with newer ones. I also have got Tinker Board, BPi M2-Zero, OPi +2, ... that I'll do too. It'll take me some time.  Cheers

Link to comment
Share on other sites

 

@tkaiser Every time I get this message with the RPi 3B+. It takes between 3/7 minutes before it starts the Benchmarks.

"System too busy for benchmarking: 17:45:11 up 1 min,  1 user,  load average: 0.60, 0.18, 0.06"

First results.

RPi 3B+ No OC No fan RPi PSU http://ix.io/1iGM
RPi 3B+ No OC No fan Crappy 2.5A PSU Undervoltage when maxed out in 7zip http://ix.io/1iH0
RPi 3B+ OC 1570 No Fan Rasp PSU http://ix.io/1iGz


RPi 3B+ No OC No fan 2A PSU (not a bad one)  http://ix.io/1iHb

RPi 3B+ No OC With Fan+heatsink RPi PSU  http://ix.io/1iHr

RPi 3B+ OC 1570 With fan/heatsink RPi PSU http://ix.io/1iHA

 

RPi 3B No OC No fan with heatsink RPi PSU http://ix.io/1iHI

RPi 3B No OC With fan/heatsink RPi PSU http://ix.io/1iHV

 

Raspbian Stretch 2018-03-13 No OC No Fan http://ix.io/1iI5  <--- @tkaiser Very interesting ! ! ! You are right ;)

 

 

More to come on the RPi. Also RPi 3B with good and bad PSU. Tomorrow I'm going to make the video about it, so I need as much info as possible.
 

Link to comment
Share on other sites

1 hour ago, NicoD said:

It takes between 3/7 minutes before it starts the Benchmarks.

"System too busy for benchmarking: 17:45:11 up 1 min,  1 user,  load average: 0.60, 0.18, 0.06"

 

This is by intention to prevent high background activity destroying benchmark results. The reality out there is that users fire up their SBC after some time for benchmarks, then 'unattended-upgrades' task runs in the background utilizing one CPU core with 100% installing outstanding updates. If a benchmark would now start the scores would be 'mysteriously' lower than they should be. The web is full of such BS numbers.

 

So this is a rather simple mechanism to prevent this. My recommendation would be a reboot in between to better spot if swapping occured while running the benchmarks (after a reboot you can simply compare both 'Swap' lines in detailed output)

 

To elaborate on the RPi situation:

  1. The Linux kernel has no idea at which clockspeed CPU cores are running. Linux' cpufreq driver talks via a mailbox interface to the primary OS (ThreadX -- a closed source RTOS controlling the RPi hardware) and RPi Trading folks decided to let the ThreadX counterpart lie all the time. That's why Linux is reporting running at 1200 MHz, 1400 MHz or even with your overclocking experiments at 1570 MHz while in reality the ARM cores are either throttled (heat based) or 'frequency capped' (undervoltage based -- when input voltage drops below 4.63V ThreadX downclocks this and that and cuts supply voltage to the CPU cores)
  2. To get an idea at which clockspeed the ARM cores are running at the moment we always need to execute 'vcgencmd measure_clock arm' ('vcgencmd' translates to 'VideoCore generate command'). But we would need to execute this every milliseconds to really get an idea whether throttling or frequency capping is happening right now. Impossible
  3. In the past we could at least query the 'firmware' running on VC4 to tell the truth. The weapon of choice is again called 'vcgencmd': When we executed 'vcgencmd get_throttled' we get an obscure hex value we can convert into binary and then we can read 19 bits for which the first 3 and last 3 bits are somewhat documented (see below the excerpt from your undervoltage log)
  4. Things have changed and with most recent ThreadX update RPi Trading employees decided to cheat a little bit more. Now when the SoC's temperature exceeds 60°C then silent throttling will happen on the RPi 3 B+. They decrease supply voltage to the CPU cores and limit max cpufreq to 1200 MHz but do not set the respective 'throttling' bits so we can not rely on 'vcgencmd get_throttled' any more. Details: https://www.raspberrypi.org/forums/viewtopic.php?f=63&amp;t=217056#p1334921
Querying ThreadX on RPi for thermal or undervoltage issues:

1010000000000000101
|||             |||_ under-voltage
|||             ||_ currently throttled
|||             |_ arm frequency capped
|||_ under-voltage has occurred since last reboot
||_ throttling has occurred since last reboot
|_ arm frequency capped has occurred since last reboot

 

Link to comment
Share on other sites

42 minutes ago, tkaiser said:

RPi Trading employees decided to cheat a little bit more.

I'm currently downloading the rpi-stretch from 13/03/2018. Release date of the 3B+.
I'll do the same on that. We'll be able to compare those.
I'll keep posting my results.
I found the bad 2.5A very interesting. Also my 2A PSU did very well. So I can show it isn't the amperage but the voltage that's important. Everyone always says, buy a 2.5A or 3A...
Thanks for all the info. I'm learning every minute. Cheers.
ps. Check above post for more results. I'll keep it all there to save room on this thread. RPi 3B now comming.

 

@tkaiser

Check the 3B no fan vs 3B+ no fan. The 3B outperforms the 3B+ until it overheats to +80°C.

Link to comment
Share on other sites

2 hours ago, NicoD said:

RPi 3B No OC No fan with heatsink RPi PSU http://ix.io/1iHI

 

OMG: They managed to screw up their proprietary ThreadX crap again.

 

The log shows throttling happening but when querying 'vcgencmd get_throttled' we get a wrong answer with latest ThreadX release:

Querying ThreadX on RPi for thermal or undervoltage issues:

100000000000000010
|||             |||_ under-voltage
|||             ||_ currently throttled
|||             |_ arm frequency capped
|||_ under-voltage has occurred since last reboot
||_ throttling has occurred since last reboot
|_ arm frequency capped has occurred since last reboot

They return '100000000000000010' (18 bits) but it should read '0100000000000000010' (19 bits and shifted by one position). The closer you get to the Raspberry Pi the more ignorance and stupidity is involved :(

 

And yeah, results as expected. Their latest 'incremental update' (the RPi 3 B+) was faster than the predecessor only for some months. They happily ruined performance on the RPi 3 B+ recently with their latest ThreadX update starting to throttle already at 60°C to masquerade instability issues a few RPi users suffer from.

Link to comment
Share on other sites

12 hours ago, NicoD said:

Raspbian Stretch 2018-03-13 No OC No Fan http://ix.io/1iI5  <--- @tkaiser Very interesting ! ! !

 

I added some of your numbers and insights to https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md -- IMO we're done with RPi measurements since most important lesson learned is: we always need to check firmware version first when talking about performance. The majority of performance relevant stuff on the RPi happens in the closed source domain on the VideoCore.

 

On the other hand the average RPi user is not affected since pretty clueless and not interested in details anyway. All published RPi 3B+ benchmarks were made in March, April and May, only afterwards 1st firmware that trashed performance was released (4800f08a139d6ca1c5ecbee345ea6682e2160881 from Jun 7 2018). Now the boards in reality run as slow or even slower than the RPi 3B predecessor but as usual no one takes notice and users are happy since 'having bought the faster board' -- it's all about feelings and not reality :)

 

@NicoD in case you want to visualize those 'cheating' effects in a video a nice way would be to install RPi-Monitor * and let it output in a browser windows while running 'sbc-bench.sh m' in a terminal window next to it while executing benchmarks. To my knowledge all performance monitoring solutions for the RPi rely on the Linux way of things (querying /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq) which only shows the fake clockspeeds but not the real ones.

 

 

https://rpi-experiences.blogspot.com/p/rpi-monitor.html

Link to comment
Share on other sites

On 7/27/2018 at 8:42 PM, NicoD said:

khadas vim 2 max http://ix.io/1ixi

 

@numbqq from Khadas executed sbc-bench on a Vim 2 with Bionic and kernel 4.17: https://forum.khadas.com/t/cpu-frequency-up-to-2ghz/2010/24?u=tkaiser

 

I added your and his numbers + some potential insights to Results.md. So no need to re-test with Vim 2 (maybe only again with latest sbc-bench version since detailed results provide a lot more info. But I would still prefer Bionic/Stretch when testing with kernel 4.9 again :) )

Link to comment
Share on other sites

@tkaiser

I've finished my video. Here it is.


I had to keep it simple. As always I forgot to mention a lot of information. I don't script to save time, but it also shows in the (non)quality of my work.
Thanks for all the help and info.
Now I can begin with the other sbc's. You'll hear from me.
Cheers
 

Link to comment
Share on other sites

22 hours ago, tkaiser said:

ODROID-XU4: ... http://com.odroid.com/sigong/blog/blog_list.php?bid=198 (would be interesting whether crypto performance differs since HK's kernel should make use of XU4's proprietary crypto engine)

Since I had this image installed on eMMC and the board was ready to use I ran a benchmark and it produced +/- similar to already present in the table results.

Log: http://ix.io/1iLy

Link to comment
Share on other sites

 

@tkaiser
I can't do anything about that link on the RPi forum. It always shortens it. But if you click it it sends you to the right page.

NanoPC-T3+ http://ix.io/1iRu
Something is clearly wrong. It's stuck on 400Mhz. Before the bench's it tests all frequency's. Does it turn it back to original?
It's with the Armbian you asked me to do.

My first try gave this a couple times

"line 419 : command not found"

This was the result.
http://ix.io/1iR8

Link to comment
Share on other sites

14 minutes ago, NicoD said:

Something is clearly wrong. It's stuck on 400Mhz.

 

Yep. Funnily this kernel reports /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies in descending order so after checking the cpufreq OPPs the lowest frequency is set. Please check out latest release, I hope this and that fixed it.

Link to comment
Share on other sites

59 minutes ago, tkaiser said:

 

Yep. Funnily this kernel reports /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies in descending order so after checking the cpufreq OPPs the lowest frequency is set. Please check out latest release, I hope this and that fixed it. 

http://ix.io/1iRJ NanoPC-T3+
The changes did the work, but indeed no space left. Just checked with gparted, it shows that the partition sda1 is 59GB.
No idea if I'm doing this right.

http://ix.io/1iR0 Armbianmonitor results


I'll install the Image again.

Link to comment
Share on other sites

14 minutes ago, NicoD said:

http://ix.io/1iRJ NanoPC-T3+
The changes did the work

 

Thank you!

 

I added your results. Overall better performance but it's really interesting to see Xenial/armhf outperforming everything else again with AES crypto performance at small data chunks.

 

Eagerly waiting for more results (from other boards) since we start to get some understanding why common benchmark results are that weird.

Link to comment
Share on other sites

I'm surprised to see that the NanoPC-T3+ seems to do better than the XU4 in multicore. And almost everything else...
It does outperform the XU4 in Blender too.


Here the scores from zador.blood.stained XU4

 

On 7/30/2018 at 7:00 PM, zador.blood.stained said:

Since I had this image installed on eMMC and the board was ready to use I ran a benchmark and it produced +/- similar to already present in the table results.

Log: http://ix.io/1iLy

http://ix.io/1ixL Old bench xu4 with Jessie

VS

NanoPC T3+

1 hour ago, NicoD said:

 

Also T3+ vs T4 is interesting. T3+ outperforms in all Multicore tasks.
@tkaiser
Could you do the bench with the T4 again with 2Ghz/1.5Ghz?


Here the results for the Odroid C2 : http://ix.io/1iSh

All expected results when it's not overclocked.
Cheers

Link to comment
Share on other sites

6 hours ago, NicoD said:

seems to do better

 

There is no 'better' :)  It's still about generating insights and not numbers. The octa-core Nexell SoC with appropriate cooling is able to run certain demanding tasks that make use of all CPU cores in parallel faster than other SoCs. But that's it. The Samsung on the XU4 shows way better single threaded performance on a big core so most typical SBC tasks perform a lot better on the XU4 instead.

 

6 hours ago, NicoD said:

Could you do the bench with the T4 again with 2Ghz/1.5Ghz?

 

Sure but why? Some numbers will improve by 4 percent and that's it.

Link to comment
Share on other sites

7 hours ago, NicoD said:

T3+ outperforms in all Multicore tasks

 

Yep. But to get a better understanding of some stuff it would be highly appreciated if you could run the following test on the T3+: making the SoC quad-core:

for i in 7 6 5 4 ; do
    echo 0 >/sys/devices/system/cpu/cpu${i}/online
done
sbc-bench.sh neon

This will kill CPU cores 4-7 and might provide insights as how different A53 cores perform with the cpuminer stuff (that's considered optional for a reason). For example we see that S905X and RK3328 are outperformed here by S905 (without X) but have no idea yet why: https://forum.armbian.com/topic/7819-sbc-bench/?do=findComment&amp;comment=59185

 

Same test with S912/Vim2 (running Debian Stretch arm64 -- otherwise it's useless) would be interesting too...

 

Link to comment
Share on other sites

12 hours ago, tkaiser said:

for i in 7 6 5 4 ; do echo 0 >/sys/devices/system/cpu/cpu${i}/online
done

When I do that it reboots the sbc.
Then I can't do that again(no such file), so I did sbc-bench, but no difference. I don't know.
http://ix.io/1iVP
I'll do the Odroid XU4 with Ubuntu 18.04 from Odroid now.
What else you want? Tinker Board? Khadas? OPi +2? BPi 0? RPi zero?

Link to comment
Share on other sites

4 minutes ago, NicoD said:

I'll do the Odroid XU4 with Ubuntu 18.04 from Odroid now.

 

Hmm... not really needed since @zador.blood.stained already tested with exactly same image (I was curious whether openssl distro package can make use of the crypto engine with Hardkernel's image -- but to no avail). If XU4 again then our https://dl.armbian.com/odroidxu4/Debian_stretch_next.7z would be interesting but most probably exactly same numbers as already collected).

 

Vim2 would be interesting with Debian Stretch and 4.9 kernel (no idea whether that's available somewhere). Do you also have the S905X Vim?

 

Testing 32-bit boards with A7 cores IMO isn't needed. Maybe the OPi Plus 2 with mainline kernel...

Link to comment
Share on other sites

57 minutes ago, tkaiser said:

I was curious whether openssl distro package can make use of the crypto engine with Hardkernel's image

Speaking of OpenSSL, I wonder if we are actually collecting useful data and not usual "numbers without meaning", as the performance may depend on several factors:

  • OpenSSL version (versions lower than 1.1.0 don't support AF_ALG)
  • AF_ALG actually enabled in the kernel
  • Kernel crypto options (kernel mode NEON and NEON optimized AES)

EDIT: not sure if OpenSSL uses AF_ALF by default, but pretty sure that cryptsetup does

 

 

57 minutes ago, tkaiser said:

Hmm... not really needed since @zador.blood.stained already tested with exactly same image

Though for me cpuminer silently failed to build even though I added the "neon" command line parameter. Maybe providing precompiled (and statically linked) cpuminer binaries for armhf and arm64 is a good idea to reduce the number of installed dependencies?

Link to comment
Share on other sites

25 minutes ago, zador.blood.stained said:

Speaking of OpenSSL, I wonder if we are actually collecting useful data and not usual "numbers without meaning", as the performance may depend on several factors:

  • OpenSSL version (versions lower than 1.1.0 don't support AF_ALG)
  • AF_ALG actually enabled in the kernel
  • Kernel crypto options (kernel mode NEON and NEON optimized AES)

 

Well, identifying such stuff is IMO part of the journey. One of my goals is to collect several numbers for the same hardware to be able to educate users that they never 'benchmark the hardware' but all the time also 'software and settings'. One side effect of such projects is that we could walk through all the kernels we maintain to consolidate crypto options then being able to demonstrate how important such stuff is (by re-running sbc-bench and listing one board multiple times).

 

I just checked the output for OpenSSL versions:

  • Ubuntu Xenial: OpenSSL 1.0.2g 1 Mar 2016
  • Debian Stretch: OpenSSL (version 1.1.0f, built on 25 May 2017)
  • Ubuntu Bionic: OpenSSL (version 1.1.0g, built on 2 Nov 2017)

If ARMv8 Crypto Extensions are available funnily openssl numbers with smaller data chunks are higher with 1.0.2g than 1.1.0g (see NanoPC T3+ numbers or those for Vim2). Did you already check whether all our kernels have the respective AF_ALG and NEON switches enabled?

 

39 minutes ago, zador.blood.stained said:

Though for me cpuminer silently failed to build even though I added the "neon" command line parameter

 

Yes, can be observed with armhf userland. I didn't care that much since at least with Cortex-A7 cpuminer/NEON performance is that low that it's close to misleading to list numbers. Wrt precompiled binaries... I'm currently building GCC 7.3 on Stretch just to let sbc-bench run again this time with GCC 7.3 building cpuminer binary. Since currently we've for whatever reasons way better cpuminer scores when running with Bionic and I want to see whether it's maybe just GCC version difference (6.3 vs. 7.3)

Link to comment
Share on other sites

11 hours ago, tkaiser said:

Since currently we've for whatever reasons way better cpuminer scores when running with Bionic and I want to see whether it's maybe just GCC version difference (6.3 vs. 7.3)

 

In the meantime I built GCC 8.2 on Stretch on the NanoPC T4. Cpuminer now scores 10.27 kH/s on Debian Stretch when built with GCC 8.2 vs. 8.24 kH/s when built with Stretch's GCC 6.3.

 

This is stuff I want to outline. How cheap it can be to get better performance by simply caring about software :) 

 

To build a new GCC version on the machine I followed this recipe (last line important since cpuminer needs to be rebuild afterwards):

GCCVer="7.3.0" # replace with "8.2.0" when wanting to use this version
cd /usr/local/src
wget https://ftp.gnu.org/gnu/gcc/gcc-${GCCVer}/gcc-${GCCVer}.tar.xz
tar xf gcc-${GCCVer}.tar.xz && rm gcc-${GCCVer}.tar.xz
mkdir build
cd gcc-${GCCVer}
./contrib/download_prerequisites
cd ../build
../gcc-${GCCVer}/configure
make -j $(grep -c '^processor' /proc/cpuinfo)
make install
echo "/usr/local/lib64" >/etc/ld.so.conf.d/usrLocalLib64.conf
ldconfig
gcc --version
[ -d /usr/local/src/cpuminer-multi ] && rm -rf /usr/local/src/cpuminer-multi

(I did the first steps always on a RK3399 board, and then transferred the build directoy to a RK3328 board and executed the final steps starting with 'make install' there -- twice as fast)

Edited by tkaiser
Added GCC recipe
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines