Jump to content

Recommended Posts

Posted

@malvcr or @chwe: are you able to provide 'sbc-bench neon' output for BPi R2?

 

Anyone here with a RPi 3 B+ able to run 'sbc-bench neon' with this board one time allowing for throttling (pretty easy since operation without fan should be sufficient) and one time allowing for undervoltage/frequency capping (using any crappy Micro USB cable combined with a 2A USB wall wart)? @NicoD maybe?

 

A screenshot from a putty session showing the output would be great...

Posted

Latest commit added @wtarreau's great mhz tool to the reports to spot strange things happening (especially on Raspberry Pi and with Amlogic SoCs):

 

Example output from a Rock64:

Checking cpufreq OPP:

Cpufreq OPP:  408    Measured: 400.981/400.762/400.857
Cpufreq OPP:  600    Measured: 592.858/592.944/592.672
Cpufreq OPP:  816    Measured: 808.872/808.932/809.031
Cpufreq OPP: 1008    Measured: 1000.598/1000.816/1000.416
Cpufreq OPP: 1200    Measured: 1193.027/1192.765/1193.027
Cpufreq OPP: 1296    Measured: 1288.983/1285.487/1288.694
Cpufreq OPP: 1392    Measured: 1385.218/1384.623/1384.995

And from a RockPro64:

Checking cpufreq OPP for cpu0-cpu3:

Cpufreq OPP:  408    Measured: 406.192/406.314/406.319
Cpufreq OPP:  600    Measured: 598.053/598.195/598.344
Cpufreq OPP:  816    Measured: 814.302/814.292/814.001
Cpufreq OPP: 1008    Measured: 1006.214/1006.239/1006.214
Cpufreq OPP: 1200    Measured: 1197.827/1198.355/1198.369
Cpufreq OPP: 1416    Measured: 1414.209/1414.286/1414.023

Checking cpufreq OPP for cpu4-cpu5:

Cpufreq OPP:  408    Measured: 406.563/406.592/406.636
Cpufreq OPP:  600    Measured: 598.649/598.310/598.581
Cpufreq OPP:  816    Measured: 814.583/815.065/814.663
Cpufreq OPP: 1008    Measured: 1006.509/1006.558/1006.570
Cpufreq OPP: 1200    Measured: 1198.494/1198.564/1198.591
Cpufreq OPP: 1416    Measured: 1414.612/1414.596/1414.534
Cpufreq OPP: 1608    Measured: 1606.477/1606.577/1606.677
Cpufreq OPP: 1800    Measured: 1798.487/1798.587/1798.627

 

These checks are done twice: At the start of the benchmark when the system is idle and again directly after the most demanding test has finished and the CPUs are heated up to the max (7-zip or cpuminer based on 'sbc-bench' vs. 'sbc-bench neon').

 

Results made with RPi 3, 3+ and Vim2 might look really funny then :)

Posted

R2 1.2 without the neon parameter (and with some services running) ...  

This machine it is working, together with a BPI-M2+ H2+, inside a plastic box with a big server-grade fan.  The benchmark can't check the temperature :-( ... 

 

The machine has two toshiba 2GB hard disks on RAID-1 configuration.
 

Memory performance:
memcpy: 1502.1 MB/s (0.2%)
memset: 3799.8 MB/s 

7-zip total scores (3 consecutive runs): 2391,2545,2565

OpenSSL results:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      27627.19k    31935.34k    33210.62k    33593.34k    33737.39k
aes-128-cbc      27584.11k    31872.77k    33287.17k    33596.07k    33712.81k
aes-192-cbc      24180.63k    27425.83k    28424.28k    28639.23k    28781.23k
aes-192-cbc      24145.71k    27322.43k    28412.07k    28659.37k    28699.31k
aes-256-cbc      21727.44k    24301.10k    25082.11k    25248.43k    25346.05k
aes-256-cbc      21694.57k    24263.27k    25082.11k    25247.74k    25294.17k

Full results uploaded to http://ix.io/1iGw. Please check the log for anomalies (e.g. swapping
or throttling happenend) and otherwise share this URL.


R2.1.2 with the neon parameter and those services off


 

Memory performance:
memcpy: 1504.7 MB/s 
memset: 3802.0 MB/s 

7-zip total scores (3 consecutive runs): 2626,2601,2569

OpenSSL results:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      27524.84k    31910.27k    33292.54k    33607.00k    33759.23k
aes-128-cbc      27591.76k    31838.85k    33295.27k    33685.50k    33783.81k
aes-192-cbc      24112.55k    27433.26k    28457.56k    28686.34k    28778.50k
aes-192-cbc      24179.14k    27386.11k    28461.23k    28730.37k    28805.80k
aes-256-cbc      21641.58k    24290.86k    25089.37k    25263.45k    25335.13k
aes-256-cbc      21703.41k    24254.40k    25069.99k    25299.97k    25354.24k

Full results uploaded to http://ix.io/1iGV. Please check the log for anomalies (e.g. swapping
or throttling happenend) and otherwise share this URL.

I that "neon" a parameter? ... I really can't see an important difference between both run.  Anyway, you have two sets of numbers to deal with.


Other day I can check with the R2 1.1 version ... is that the machine it is not in working order right now :-) ...  also, I will do it without RAID-1 and using a SSD disk instead of regular hard disks.

 

Posted
20 hours ago, NicoD said:

 

FYI: 1st link got somehow corrupted and points now to https://github.com/ThomasKaiser/sbc-ben (404)

 

@TonyMac32: are you able to provide 'sbc-bench neon' numbers for Le Potato and the Tinkerboard if time permits (using Stretch or Bionic)? And in case we have working DVFS with H5 and get with a board close to 1.4GHz (NanoPi K1 Plus? OrangePi Prime?) numbers would also be nice.

Posted

Le Potato:  http://ix.io/1iSQ

(1.408 GHz max speed when 1.512 commanded)

 

Tinker: http://ix.io/1iSX

(Claiming throttling, but I think it's because we add OC OPPs and limit it to 1.8 GHz by default)

 

NanoPi K2 (S905 has different blob than S905X): http://ix.io/1iT1

(I need to build a 4.18 and see if that makes some of the differences go away. )

 

The Potato and the K2 had extremely different results, I need to re-run the K2 with a 4.18 kernel and see if that changes things, since Meson64 mainline is still active development, whereas RK3288 is more or less stable

 

Memory access in Meson64, it seems the Potato wins by a good margin,
 

Potato: 	standard memcpy : 1812.9 MB/s (0.2%) standard memset : 5731.6 MB/s

K2: 		standard memcpy : 1655.6 MB/s (0.3%) standard memset : 3871.9 MB/s

C2: 		standard memcpy : 1424.9 MB/s (0.2%) standard memset : 2600.6 MB/s

... But then miner showed something odd:

 

K2:		Total: 4.62 kH/s

C2:		Total: 4.63 kH/s

Potato:		Total: 3.93 kH/s  <---- wtf?!?!

 

Posted
1 hour ago, TonyMac32 said:

Tinker: http://ix.io/1iSX

(Claiming throttling, but I think it's because we add OC OPPs and limit it to 1.8 GHz by default)

 

Yep, my code gets confused by those overclock OPPs but the detailed results show clearly that execution happened all the time at the '1800 MHz' OPP that is 1730 MHz in reality (how I love all this firmware cheating ;) )

 

Wrt NanoPi K2 vs. Le Potato: It's S905 (no ARMv8 Crypto Extensions) vs. S905X (there Amlogic licensed the stuff). So numbers as expected, simply compare with ODROID-C2. But it's obvious that only Hardkernel got a BS free BLOB from Amlogic whereas FriendlyELEC has to use the cheating BLOB faking cpufreqs above 1480)

 

The cpuminer numbers for Le Potato are in sync with another A53 running at 1400 MHz (see Rock64 and Renegade based on RK3328 -- we can only compare numbers that were generated with exactly identical GCC/distro version: Stretch arm64). What's surprising is that S905 outperforms S905X and RK3328 here (maybe related to L1/L2 cache sizes or something like this)

 

I added the new numbers to https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md -- thank you!

Posted
4 hours ago, tkaiser said:

What's surprising is that S905 outperforms S905X and RK3328 here (maybe related to L1/L2 cache sizes or something like this)

 

@TonyMac32 do you have a /sys/devices/system/cpu/cpu0/cache node on those platforms? Seems support to report ARM cache info on 64-bit systems has been added a long time ago but needs 'next-level-cache' DT nodes.

 

In the meantime I consolidated kernel settings so that all our kernels are able to be queried for throttling. And starting with sbc-bench v0.5 cache reporting (if available) is enabled.

Posted
On 7/30/2018 at 9:42 PM, NicoD said:

 

LOL, jamesh's excuse for the RPi always showing only faked cpufreq readings is really funny ('Due to using upstream code for CPU frequency reporting...'). Here his colleague Phil is explaining the real cause (their mailbox interface simply returning requested instead of real values): https://github.com/raspberrypi/linux/issues/2512#issuecomment-382703153

Posted
1 hour ago, tkaiser said:

LOL, jamesh's excuse for the RPi always showing only faked cpufreq readings is really funny

Yeah. I don't like his answers. He tries to minimize it, and tries to change the subject from "not giving true clocks" to, "we did it for the good of the users". Well, then inform your users...
It doesn't seem like they are willing to change anything. It's there choice.
I contacted ETA-Prime and told him about it. He seemed very intersted. I hope he'll make a video, then there forum will explode with unhappy comments.
I'll stay polite. I know I'dd have the whole RPi-fanboyclub after me if I wouldn't.
 

I'm amazed you stayed out of it :lol:

I'll start doing some more sbc-bench's.

Posted
24 minutes ago, NicoD said:

I'll start doing some more sbc-bench's.

 

Fine. But please always again grab latest version. I continually try to improve things resulting in better reporting.

 

To stay focused on sbc-bench -- such cheating behaviour on some platforms like Raspberries or Amlogic SBC other than ODROID-C2 is the reason it needs detailed monitoring about what's going on. Something that should be addressed in sbc-bench from the very beginning.

Posted
12 hours ago, tkaiser said:

But it's obvious that only Hardkernel got a BS free BLOB from Amlogic whereas FriendlyELEC has to use the cheating BLOB faking cpufreqs above 1480)

I need to verify this, remember I mainlined the K2 U-boot with the generic Amlogic gxb blob.  I'll see what I get with the friendlyelec blob, but the repo looks like the garden variety stuff to me.

Posted
On 8/1/2018 at 6:23 AM, tkaiser said:

@TonyMac32 do you have a /sys/devices/system/cpu/cpu0/cache node on those platforms?

I missed this.  Yes, there is a node on the Amlogic devices.

Posted
Checking cpufreq OPP... Done.
Executing tinymembench.  This will take a long time...

image.png.78bd5aee08230739aabc14f90a8f40eb.png

 

Well, no luck, apparently nothing about size is in the node.  what you can see is what is in the fliers, data/instruction cache per cpu and a unified level 2. 

Posted
1 hour ago, TonyMac32 said:

what you can see is what is in the fliers, data/instruction cache per cpu and a unified level 2

 

Hmm...

  • S905: L1: '32KB instruction cache and 32KB data cache' + '512KB Unified L2 cache'
  • S905X: L1: 32KB I/D-Cache + 'large unified L2 cache'

Well, 32KB for each or for both and 512KB vs. 'large'. If something that has been called '512KB' in the past now is called 'large' it's safe to assume that 'large' is less than 512KB?

 

Just as a reference:

  • A64: L1: '32KB L1 Instruction cache and 32KB L1 Data cache' + '512KB L2 cache'
  • RK3328: L1 I cache size: 32K, L1 D cache size: 32K, L2 cache size: 256K
  • RK3399: 'Integrated 48KB L1 instruction cache, 32KB L1 data cache for each Cortex-A72', 'Integrated 32KB L1 instruction cache, 32KB L1 data cache for each Cortex-A53' and '1024KB unified L2 Cache for big cluster, 512KB unified L2 Cache for little cluster' (Rockchip_RK3399TRM_V1.4_Part1-20170408.pdf is even more verbose)
  • S5P6818: 'L1: '32 Kbyte I-Cache, 32 Kbyte D-Cache' + L2: '1 Mbyte Shared Cache'
Posted
2 minutes ago, tkaiser said:

Well, 32KB for each or for both and 512KB vs. 'large'. If something that has been called '512KB' in the past now is called 'large' it's safe to assume that 'large' is less than 512KB?

 

I saw that as well looking through the available docs.  If the performance seems on par with rk3328, then perhaps that's a plausible enough answer.

Posted
1 hour ago, TonyMac32 said:

If the performance seems on par with rk3328, then perhaps that's a plausible enough answer.

 

I just checked also against RK3399 with A72 cores killed. Below 7-zip MIPS and cpuminer kH/s (all binaries built with GCC 6.3):


RK3328:

  • Renegade @ 1380 MHz: 3710, 3.92 kH/s (faster memory than Rock64)
  • Rock64 @ 1380 MHz: 3610, 3.85 kH/s

RK3399 (with A72 cores killed):

  • NanoPC T4 @ 1415 MHz: 3920, 4.54 kH/s (way faster memory than the other boards)

S905:

  • NanoPi K2 @ 1480 MHz: 3850, 4.61 kH/s (slightly higher cpufreq)
  • ODROID-C2 @ 1530 MHz: 3870, 4.63 kH/s (slightly higher cpufreq)

S905X:

  • Le Potato @ 1410 MHz: 3780, 3.92 kH/s (faster memory than Rock64)

 

Unfortunately testing on NanoPi Fire3 with just 4 CPU cores is not possible since when trying to kill CPU cores the system deadlocks. Might need changed kernel config?

root@nanopifire3:~# zgrep -i hotplug /proc/config.gz 
CONFIG_HOTPLUG_CPU=y
# CONFIG_ARM_DYNAMIC_CLUSTER_HOTPLUG is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set

cpuminer allows to set CPU affinity but this doesn't work since still firing up 8 threads and then results are lower as expected (3.24kH/s):

taskset -c 0-3 ./cpuminer --benchmark --cpu-priority=2 --cpu-affinity=0x15

 

Posted

Hi,

 

I saw you tested your Up board based on atom z8300, so I tested my x5 based on z8350: http://ix.io/1lwg

However, I have only manjaro installed on it and cpufreq is not reported, nor the soc temperature, so maybe this numbers are not relevant for you.

 

Posted
13 hours ago, jeanrhum said:

I saw you tested your Up board based on atom z8300, so I tested my x5 based on z8350: http://ix.io/1lwg

 

That's pretty interesting. Can you please provide the output of the following:

uname -a
/usr/local/src/mhz/mhz 3 1000000
find /sys -name cpufreq

Is it possible to easily rebuild the kernel on Manjaro? All the cpufreq stuff is missing (most probably not for a reason -- we consolidated this just recently in Armbian)

Posted

uname -a gives:

Linux x5 4.17.18-1-MANJARO #1 SMP PREEMPT Wed Aug 22 21:51:13 UTC 2018 x86_64 GNU/Linux

/usr/local/src/mhz/mhz 3 1000000 gives:

count=807053 us50=21105 us250=105599 diff=84494 cpu_MHz=1910.320 tsc50=30391272 tsc250=152061768 diff=150 rdtsc_MHz=1439.990
count=807053 us50=21134 us250=106725 diff=85591 cpu_MHz=1885.836 tsc50=30432150 tsc250=153683442 diff=152 rdtsc_MHz=1440.003
count=807053 us50=21113 us250=105892 diff=84779 cpu_MHz=1903.898 tsc50=30403566 tsc250=152484984 diff=151 rdtsc_MHz=1439.996

and sudo find /sys -name cpufreq gives:

/sys/devices/system/cpu/cpu3/cpufreq
/sys/devices/system/cpu/cpu1/cpufreq
/sys/devices/system/cpu/cpufreq
/sys/devices/system/cpu/cpu2/cpufreq
/sys/devices/system/cpu/cpu0/cpufreq
/sys/module/cpufreq

I made a big upgrade yesterday after running your script, but I was already with a 4.17 kernel.

If I have time, I will look at kernel and cpufeq stuff. During execution of sbc-bench, there was multiple errors about cpufreq referring to line 144 (if I remember well).

Posted
11 minutes ago, jeanrhum said:

If I have time, I will look at kernel and cpufeq stuff. During execution of sbc-bench, there was multiple errors about cpufreq referring to line 144 (if I remember well).

 

After latest commit the output should be much better on Arch/Manjaro and localized systems. In case you want to try it again please provide iostat utility first via installing sysstat package (always helpful):

pacman -Syu sysstat

 

Posted

Latest sbc-bench version 0.6 implements a new 'thermal testing' mode:

root@nanopct4:/home/tk# sbc-bench.sh -h
Usage: sbc-bench.sh [-c] [-h] [-m] [-t $degree] [-T $degree]

############################################################################

 Use sbc-bench.sh for the following tasks:

 sbc-bench.sh invoked without arguments runs a standard benchmark
 sbc-bench.sh -c also executes cpuminer test (NEON/SSE)
 sbc-bench.sh -h displays this help text
 sbc-bench.sh -m [$seconds] provides CLI monitoring (5 sec default interval)
 sbc-bench.sh -t [$degree] runs thermal test waiting to cool down to this value
 sbc-bench.sh -T [$degree] runs thermal test heating up to this value

############################################################################

root@nanopct4:/home/tk# 

This mode makes only use of cpuminer and is only useful to measure thermal effects (looking at temperatures, cpufreq statistics and relative performance differences) but provides no general benchmarking functionality other than displaying thermal efficiency of

  • physical aspects of heat dissipation (heatsink, heatsink + fan, nothing, airflow, enclosure or not, whatever)
  • efficiency of throttling settings (bad settings --> bad performance once throttling occurs)

Both -t and -T need an integer value as second argument: e.g. '-t 50' or '-T 70'. In the former case sbc-bench ensures that SoC temperature falls below 50°C prior to starting the test, the latter case will result in the tool heating up to 70°C first and then starting the test.

 

Both modes should be used combined to generate a reproducable environement. E.g.

sbc-bench.sh -T 70 ; sbc-bench.sh -t 50

This will result in a first test heating up the SoC to 70°C, then testing, then let the second run wait to cool down to 50°C before 2nd test starts. This way it's ensured that several tests run under almost identical conditions.

 

If you want to skip pre-heating or waiting for chill down simply use -T with a very low value, e.g. 'sbc-bench -T 20'.

 

As an example comparison of NanoPC-T4 with stock heatsink. 1st test with FriendlyELEC's blue thermal pad between RK3399 and heatsink, 2nd time with the thermal pad replaced by a copper shim with thermal paste (test duration is ~302 seconds).

 

NanoPC-T4_Copper_Shim.jpg

 

 

With thermal pad:

1992 MHz:  100.79 sec
1800 MHz:    0.02 sec
1608 MHz:       0 sec
1416 MHz:  169.13 sec
1200 MHz:   32.26 sec

vs. copper shim:

1992 MHz:  136.00 sec
1800 MHz:       0 sec
1608 MHz:    0.02 sec
1416 MHz:  159.06 sec
1200 MHz:    7.19 sec

Copper shim + thermal paste performs better but not that much in this mode (somewhat simulating the board being enclosed and idling at around 50°C).

 

And it's also obvious that there's something seriously wrong with our throttling settings since 1800 and 1608 MHz cpufreq OPP are skipped. When those would be used performance in throttling state would be a little bit better. That's what the '-T' switch is about: operating the board under conditions where thermal trip points and such stuff can be tested efficiently.

 

Posted

Now the same excercise with RockPro64. First test is with board lying flat on a table with tall heatsink using Pine's gray thermal pad (BTW: same height as FriendlyELEC's),

 

RockPro64_Copper_Shim.jpg

 

I used ayufan's latest release image (still using 1.8/1.4 GHZ max) and 'sbc-bench.sh -T 70 ; sbc-bench.sh -t 50'

16:39:42: 1800/1416MHz  6.11 100%   0%  99%   0%   0%   0%  83.3°C
16:39:54: 1800/1416MHz  6.09 100%   0%  99%   0%   0%   0%  83.9°C
16:40:06: 1800/1416MHz  6.08 100%   0%  99%   0%   0%   0%  82.8°C
16:40:18: 1800/1416MHz  6.13 100%   0%  99%   0%   0%   0%  84.4°C
16:40:30: 1800/1416MHz  6.18 100%   0%  99%   0%   0%   0%  83.9°C

Throttling statistics (time spent on each cpufreq OPP) for CPUs 4-5:

1800 MHz:  300.11 sec
1608 MHz:    0.14 sec
1416 MHz:       0 sec

Just a little throttling occured at the end of the benchmark (thermal trip point in DT defined as 85°C)

 

Next test with thermal pad replaced by copper shim of same height and board still lying on the table:

17:11:54: 1800/1416MHz  6.12 100%   0%  99%   0%   0%   0%  70.6°C
17:12:06: 1800/1416MHz  6.10 100%   0%  99%   0%   0%   0%  71.1°C
17:12:18: 1800/1416MHz  6.09 100%   0%  99%   0%   0%   0%  72.2°C
17:12:30: 1800/1416MHz  6.13 100%   0%  99%   0%   0%   0%  73.9°C
17:12:42: 1800/1416MHz  6.11 100%   0%  99%   0%   0%   0%  72.2°C

No throttling and max temperature reported as 74°C.

 

Now bringing the board in an upright position trying to increase ventilation around the board:

17:54:53: 1800/1416MHz  6.09 100%   0%  99%   0%   0%   0%  70.0°C
17:55:05: 1800/1416MHz  6.21 100%   0%  99%   0%   0%   0%  70.0°C
17:55:17: 1800/1416MHz  6.17 100%   0%  99%   0%   0%   0%  69.4°C
17:55:29: 1800/1416MHz  6.20 100%   0%  99%   0%   0%   0%  71.1°C
17:55:42: 1800/1416MHz  6.17 100%   0%  99%   0%   0%   0%  71.7°C

No throttling and max temperature reported as 72°C

 

By simply replacing the thermal pad with a copper shim + thermal compound and otherwise identical environmental conditions (lying flat on a surface and ambient temperature around ~26°C) we get 11°C less and no throttling at all even with demanding workloads.

 

@TLLim and @mindee: IMO that's something to consider for next batch of boards and heatsinks: try to save those inefficient thermal pads and allow for direct contact between heatsink and SoC with thermal paste. Without copper shim in between heat transfer should be improved even more.

 

Almost forgot: Of course thermal pads also have an advantage: they stick pretty good to both heatsink and SoC and therefore are the better choice if the board is exposed to vibrations compared to a copper shim 'held in place' with a little bit of thermal paste. On the other hand I also have thermal glue somewhere around so that would be the next test (with NanoPi M4 once the board arrives).

Posted

New result with 0.6 version: http://ix.io/1lBq

 

In manjaro /sys/.../cpufreq does not contain cpuinfo_cur_freq but scaling_cur_freq. I modified line 164 of your script and I will report the new results soon.

 

Edit: here are the new results with cpufreq values: http://ix.io/1lBy

There is no temperature values since acpi thermal_zone0 is always equal to 0. Zone2 seems more relevant, but I'm not sure.

Posted
39 minutes ago, jeanrhum said:

There is no temperature values since acpi thermal_zone0 is always equal to 0. Zone2 seems more relevant, but I'm not sure

 

Yes, on my UP Board I also had to adjust thermal source by hand (6 different sysfs nodes, I symlinked the most appropriate to /etc/armbianmonitor/datasources/soctemp).

 

Thank you for the output! Interesting results: your Atom is allowed to clock at up to 1920 MHz but if all 4 cores are busy max cpufreq gets reduced to 1680 MHz.

 

What is this for a device exactly?

Posted

@tkaiserthanks on the test result and advise. What is your copper shim thickness and dimension, if you have the spec or link is super.  We will explore on converting to your suggested method.

Posted
10 minutes ago, TLLim said:

What is your copper shim thickness and dimension

 

20x20x1mm. Ordered them 18 months ago on Aliexpress for 2 bucks (5 pieces) but the link is dead. Anyway: I don't think such a copper shim is a good solution for end users. Heatsink able to be directly attached to SoC is better.

 

Will try again with my next RK3399 board with thermal glue between heatsink and copper shim and normal thermal paste between shim and SoC. Currently I fear a bit the shim could move when vibrations occur.

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines