Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Reputation Activity

  1. Like
    tkaiser got a reaction from lanefu in sbc-bench   
    Hi all,
     
    sbc-bench is now on Github: https://github.com/ThomasKaiser/sbc-bench
     
    I'll link from the README there to this thread for further discussion about the tool and proper benchmark methodology.
  2. Like
    tkaiser got a reaction from Igor_K in AML-S905X-CC (Le Potato) vs Odroid C2   
    Aliexpress and FriendlyELEC? Usually not a good idea unless you want to pay more for no reason: https://aliexpress.com/item/NanoPi-K2-Development-Board-Quad-core-Cortex-A53-1-5GHz-WiFi-Bluetooth-USB-Cable-RC100-Remote/32813030887.html vs. 
    https://www.friendlyarm.com/index.php?route=product/product&path=69&product_id=186
     
    Those Amlogic SoCs support ABFC so 'raw' memory bandwidth is not that much of an issue. Another comparison between S905 and S905X: http://www.stane1983.com/index.php/2017/08/18/some-thoughts-on-amlogic-part-2/
     
    In case it's not known already: https://fosdem.org/2018/schedule/event/kodi/attachments/slides/2166/export/events/attachments/kodi/slides/2166/FOSDEM_Presentation_2018___Lukas_Rusak.pdf
  3. Like
    tkaiser reacted to Icenowy in Trying to compile Pine H64   
    Thanks to help from Jernej, the magenta display issue is solved in U-Boot commit 79405999d7ee43f830825751b200d739b53f20f5 ("video: sunxi: de2/3: clear all BLD address space").
  4. Like
    tkaiser reacted to gprovost in Benchmarking CPUs   
    @zador.blood.stained I think there isn't any distro OpenSSL packages that is built with hardware engine support.
    Also, even if engine is installed, OpenSSL doesn't use any engine by default, you need to configure it in openssl.cnf.
    But you right about cryptsetup (dm-crypt), it uses AF_ALG by default. I was wondering why so much delta between my 'cryptsetup benchmark' and 'openssl speed' test on Helios4.
     
    I just did a test by compiling openssl-1.1.1-pre8 with the AF_ALG (... enable-engine enable-afalgeng ...) and here are the benchmark result on Helios4 :
     
    $> openssl speed -evp aes-xxx-cbc -engine afalg -elapsed
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 745.71k 3018.47k 11270.23k 36220.25k 90355.03k 101094.74k aes-256-cbc 739.49k 2964.93k 11085.23k 34178.05k 82597.21k 90461.53k  
    The difference is quite interesting, with AF_ALG it performs much better on bigger block size, but poorly on very small block size.
     
    $> openssl speed -evp aes-xxx-cbc -elapsed
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 44795.07k 55274.84k 59076.27k 59920.04k 59719.68k 59353.77k aes-256-cbc 34264.93k 40524.42k 42168.92k 42496.68k 42535.59k 42500.10k  
    System : Linux helios4 4.14.57-mvebu #2 SMP Tue Jul 24 08:29:55 UTC 2018 armv7l GNU/Linux
     
    Supposedly you can even have better perf with cryptodev, but I think Crypto API AF_ALK is more elegant and easier to setup.
     
    Once I have a cleaner install of this AF_ALK (or cryptodev), I will run sbc_bench and send you ( @tkaiser ) the output.
  5. Like
    tkaiser got a reaction from NicoD in Benchmarking CPUs   
    That's great since the purpose of this test is an overall estimate how performant the relevant board would be when doing 'server stuff'. If you click directly on the 7-zip link here https://github.com/ThomasKaiser/sbc-bench#7-zip then you get an explanation what's going on at the bottom of that page:
     
     
    In other words: as expected.
     
    Your use case with Blender is something entirely different and 7-zip scores are useless for this. Primarly for the reason that Blender involves floating point stuff (while 7-zip focuses on integer and low memory latency). It's as always about the use case
     
    If we look closely on the other results we see that S905 for example has an advantage with cpuminer compared to the rather similar A53 SoCs S905X and RK3328 (that perform rather identical with 7-zip for example). Maybe the root cause for cpuminer's better scores will also be responsible for better Blender results on S905 compared to other A53 SoCs? It needs a different benchmark and a lot of cross-testing with the real application to get an idea how to reliably test for your use case.
  6. Like
    tkaiser got a reaction from gounthar in rk3288 or rk3328 which is fastest?   
    Impossible to answer if you don't tell your use case.
     
     
    https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
     
    If the RK3288 claims to run at 1.8 GHz it's less than 1730 MHz in reality and until recently all Linux OS images for RK3328 were limited to 1.3 GHz. We changed this just recently in Armbian (nightlies) and enabled the '1.4 GHz OPP' (1380 MHz in reality) by default while with ayufan images you need to enable higher cpufreq OPP yourself.
     
    So usually it's 1726 vs. 1286 MHz which doesn't matter that much since
    as you pointed out the RK3288 uses high-end ARM cores while the RK3328 relies on slow A53 single-threaded peak performance of the RK3288 at '1.8 GHz' is ~1550 7-zip MIPS while RK3328 scores ~1000 at '1.4 GHz' sustained CPU performance with the RK3288 without huge heatsink (or fan) will pretty fast drop down to RK3328 levels or below. The RK3288 generates way more heat it's always about 'use case first' -- for a 'Desktop Linux' totally different performance metrics are important compared to the 'NAS use case' or when the board should serve as a VPN endpoint.
  7. Like
    tkaiser reacted to wtarreau in Benchmarking CPUs   
    What you can do is increase the 2nd argument, it's the number of loops you want to run. At 1000 you can miss some precision. I tend to use 100000 on medium-power boards like nanopis. On the clearfog at 2 GHz, "mhz 3 100000" takes 150ms. This can be much for your use case. It reports 1999 MHz. With 1000 it has a slightly larger variation (1996 to 2000). Well, it's probably OK at 10000. I took bad habits on x86 with intel_pstate taking a while to start.
     
    Maybe you should always take a small and a large count in your tests. This would more easily show if there's some automatic frequency adjustment : the larger count would report a significantly higher frequency in this case because part of the loop would run at a higher frequency. Just an idea.
     
    Or probably that you should have two distinct tools : "sbc-bench" and "sbc-diag". The former would report measured values over short periods, an the latter would be used with deeper tests to try to figure whats wrong when the first values look suspicious.
     
  8. Like
    tkaiser got a reaction from TonyMac32 in Benchmarking CPUs   
    In the meantime I built GCC 8.2 on Stretch on the NanoPC T4. Cpuminer now scores 10.27 kH/s on Debian Stretch when built with GCC 8.2 vs. 8.24 kH/s when built with Stretch's GCC 6.3.
     
    This is stuff I want to outline. How cheap it can be to get better performance by simply caring about software  
     
    To build a new GCC version on the machine I followed this recipe (last line important since cpuminer needs to be rebuild afterwards):
    GCCVer="7.3.0" # replace with "8.2.0" when wanting to use this version cd /usr/local/src wget https://ftp.gnu.org/gnu/gcc/gcc-${GCCVer}/gcc-${GCCVer}.tar.xz tar xf gcc-${GCCVer}.tar.xz && rm gcc-${GCCVer}.tar.xz mkdir build cd gcc-${GCCVer} ./contrib/download_prerequisites cd ../build ../gcc-${GCCVer}/configure make -j $(grep -c '^processor' /proc/cpuinfo) make install echo "/usr/local/lib64" >/etc/ld.so.conf.d/usrLocalLib64.conf ldconfig gcc --version [ -d /usr/local/src/cpuminer-multi ] && rm -rf /usr/local/src/cpuminer-multi (I did the first steps always on a RK3399 board, and then transferred the build directoy to a RK3328 board and executed the final steps starting with 'make install' there -- twice as fast)
  9. Like
    tkaiser reacted to NicoD in Benchmarking CPUs   
    OPi+2 Armbian Stretch http://ix.io/1iX4

     
  10. Like
    tkaiser got a reaction from WarHawk_AVG in How to set up an Orange Pi Zero with Armbian as a USB Network Card   
    This is kernel functionality. For the H3 legacy kernel we had to add a mini patch (contained in our kernel by default since more than 2 years now): https://forum.armbian.com/topic/1417-g_ether-driver-h3-device-as-ethernet-dongle/
     
    With mainline kernel it also should 'just work': see comments here: https://github.com/armbian/build/issues/538
     
     
    You just need to make sure to disable g_serial module since only one USB gadget module can be active at the same time and by default we provide a serial console on the Micro USB port by default. More info can be found by using Google Site Search for 'g_ether'.
  11. Like
    tkaiser reacted to zador.blood.stained in Benchmarking CPUs   
    I thought more about this and also ran openssl and cryptsetup through strace and checked openssl build configuration in Ubuntu.
    Stock Ubuntu (and most likely Debian) OpenSSL will use userspace crypto. So if there are CPU instructions (NEON, ARMv8 CE) - it should use them, but it won't be using HW engines like sun4i-ss or CESA. At least we have some comparable numbers as long as we don't compare OpenSSL 1.0.x results with 1.1.x directly. This means that AES numbers in the table will not resemble performance in some real world scenarios that use in-kernel crypto (like disk and filesystem encryption) But people will still use your results table to compare boards, so IMO it's worth adding a note for boards where HW crypto engines are available. ARMv8 CE is not a crypto engine, its numbers should depend on CPU performance and should be affected by throttling, compared to, i.e., CESA that uses a fixed clock.
  12. Like
    tkaiser reacted to NicoD in Benchmarking CPUs   
    No, the Vim2 Max. Odroid XU4 with armbian now.
  13. Like
    tkaiser got a reaction from NicoD in Benchmarking CPUs   
    Hmm... not really needed since @zador.blood.stained already tested with exactly same image (I was curious whether openssl distro package can make use of the crypto engine with Hardkernel's image -- but to no avail). If XU4 again then our https://dl.armbian.com/odroidxu4/Debian_stretch_next.7z would be interesting but most probably exactly same numbers as already collected).
     
    Vim2 would be interesting with Debian Stretch and 4.9 kernel (no idea whether that's available somewhere). Do you also have the S905X Vim?
     
    Testing 32-bit boards with A7 cores IMO isn't needed. Maybe the OPi Plus 2 with mainline kernel...
  14. Like
    tkaiser got a reaction from Ahmet Emin Koçal in Formatting Armbian without rebooting   
    Only way is via USB and FEL mode: https://github.com/zador-blood-stained/fel-mass-storage
  15. Like
    tkaiser got a reaction from NicoD in sbc-bench   
    LOL, jamesh's excuse for the RPi always showing only faked cpufreq readings is really funny ('Due to using upstream code for CPU frequency reporting...'). Here his colleague Phil is explaining the real cause (their mailbox interface simply returning requested instead of real values): https://github.com/raspberrypi/linux/issues/2512#issuecomment-382703153
  16. Like
    tkaiser reacted to TonyMac32 in sbc-bench   
    Le Potato:  http://ix.io/1iSQ
    (1.408 GHz max speed when 1.512 commanded)
     
    Tinker: http://ix.io/1iSX
    (Claiming throttling, but I think it's because we add OC OPPs and limit it to 1.8 GHz by default)
     
    NanoPi K2 (S905 has different blob than S905X): http://ix.io/1iT1
    (I need to build a 4.18 and see if that makes some of the differences go away. )
     
    The Potato and the K2 had extremely different results, I need to re-run the K2 with a 4.18 kernel and see if that changes things, since Meson64 mainline is still active development, whereas RK3288 is more or less stable
     
    Memory access in Meson64, it seems the Potato wins by a good margin,
     
    Potato: standard memcpy : 1812.9 MB/s (0.2%) standard memset : 5731.6 MB/s K2: standard memcpy : 1655.6 MB/s (0.3%) standard memset : 3871.9 MB/s C2: standard memcpy : 1424.9 MB/s (0.2%) standard memset : 2600.6 MB/s ... But then miner showed something odd:
     
    K2: Total: 4.62 kH/s C2: Total: 4.63 kH/s Potato: Total: 3.93 kH/s <---- wtf?!?!  
  17. Like
    tkaiser reacted to NicoD in Benchmarking CPUs   
    I'm surprised to see that the NanoPC-T3+ seems to do better than the XU4 in multicore. And almost everything else...
    It does outperform the XU4 in Blender too.

    Here the scores from zador.blood.stained XU4
     
    http://ix.io/1ixL Old bench xu4 with Jessie
    VS
    NanoPC T3+
     
    Also T3+ vs T4 is interesting. T3+ outperforms in all Multicore tasks.
    @tkaiser
    Could you do the bench with the T4 again with 2Ghz/1.5Ghz?


    Here the results for the Odroid C2 : http://ix.io/1iSh
    All expected results when it's not overclocked.
    Cheers
  18. Like
    tkaiser reacted to NicoD in Benchmarking CPUs   
    http://ix.io/1iRJ NanoPC-T3+
    The changes did the work, but indeed no space left. Just checked with gparted, it shows that the partition sda1 is 59GB.
    No idea if I'm doing this right.
    http://ix.io/1iR0 Armbianmonitor results

    I'll install the Image again.
  19. Like
    tkaiser reacted to Icenowy in Trying to compile Pine H64   
    1.16V is beyond the recommended operation range, so I didn't add it now.
     
    BTW H6 comes with two official DVFS table judged by "speed bin", maybe this needs to be implemented. ("Speed bin" info is in SID)
  20. Like
    tkaiser reacted to zador.blood.stained in Benchmarking CPUs   
    Since I had this image installed on eMMC and the board was ready to use I ran a benchmark and it produced +/- similar to already present in the table results.
    Log: http://ix.io/1iLy
  21. Like
    tkaiser reacted to NicoD in Benchmarking CPUs   
    @tkaiser
    I've finished my video. Here it is.

    I had to keep it simple. As always I forgot to mention a lot of information. I don't script to save time, but it also shows in the (non)quality of my work.
    Thanks for all the help and info.
    Now I can begin with the other sbc's. You'll hear from me.
    Cheers
     
  22. Like
    tkaiser got a reaction from devman in Benchmarking CPUs   
    @numbqq from Khadas executed sbc-bench on a Vim 2 with Bionic and kernel 4.17: https://forum.khadas.com/t/cpu-frequency-up-to-2ghz/2010/24?u=tkaiser
     
    I added your and his numbers + some potential insights to Results.md. So no need to re-test with Vim 2 (maybe only again with latest sbc-bench version since detailed results provide a lot more info. But I would still prefer Bionic/Stretch when testing with kernel 4.9 again  )
  23. Like
    tkaiser got a reaction from NicoD in Benchmarking CPUs   
    @numbqq from Khadas executed sbc-bench on a Vim 2 with Bionic and kernel 4.17: https://forum.khadas.com/t/cpu-frequency-up-to-2ghz/2010/24?u=tkaiser
     
    I added your and his numbers + some potential insights to Results.md. So no need to re-test with Vim 2 (maybe only again with latest sbc-bench version since detailed results provide a lot more info. But I would still prefer Bionic/Stretch when testing with kernel 4.9 again  )
  24. Like
    tkaiser got a reaction from NicoD in Benchmarking CPUs   
    I added some of your numbers and insights to https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md -- IMO we're done with RPi measurements since most important lesson learned is: we always need to check firmware version first when talking about performance. The majority of performance relevant stuff on the RPi happens in the closed source domain on the VideoCore.
     
    On the other hand the average RPi user is not affected since pretty clueless and not interested in details anyway. All published RPi 3B+ benchmarks were made in March, April and May, only afterwards 1st firmware that trashed performance was released (4800f08a139d6ca1c5ecbee345ea6682e2160881 from Jun 7 2018). Now the boards in reality run as slow or even slower than the RPi 3B predecessor but as usual no one takes notice and users are happy since 'having bought the faster board' -- it's all about feelings and not reality 
     
    @NicoD in case you want to visualize those 'cheating' effects in a video a nice way would be to install RPi-Monitor * and let it output in a browser windows while running 'sbc-bench.sh m' in a terminal window next to it while executing benchmarks. To my knowledge all performance monitoring solutions for the RPi rely on the Linux way of things (querying /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq) which only shows the fake clockspeeds but not the real ones.
     
     
    * https://rpi-experiences.blogspot.com/p/rpi-monitor.html
  25. Like
    tkaiser reacted to vgjdujdtcfhmrtsjy in Banana Pi R64   
    .
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines