Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. Great! So buyers of this board know where to look for and whom to ask for support. Wrt Armbian situation is a bit different since if we would start to support a new SoC family we need to play with all relevant kernel variants that are flying around to solve problems like IRQ affinity settings (since we hate hardware running slower than possible) or monitoring thermal situation (since we hate the idea users getting in trouble due to their devices overheating). In the Banana wiki there's a kernel 4.4 and a 4.14 mentioned (no idea whether that's just the result of their copy&paste monkey doing silly things or if that's related to SoC reality). Then there's mainline support. Just walking through 3 different kernel families to get an idea what's working or not and what needs fixes is wasting more than a whole day.
  2. tkaiser

    RockPro64

    I believe it's the latter. And if I would fire up 2 iperf3 tasks at the same time most probably I would use 'taskset -c 4 ' and 'taskset -c 5 ' to pin them to different big cores. Since I don't believe little cores are fast enough IRQ colissions can ruin benchmark numbers (and real world performance too) iperf3 unlike iperf also shows dropped packages every 2 seconds -- you get a bit faster the idea that something's wrong IRQ affinitiy with such fast cards might be a big issue. Ayufan chose to use these settings for both Rock64 and RockPro64: https://github.com/ayufan-rock64/linux-package/blob/d31494d0b9afafa5a98ef0f4d94dcdad611d45e3/root/usr/local/sbin/rock64_fix_performance.sh#L30-L35 (PCIe ends up on a little core while onboard Ethernet and USB3 lands on a big core). With Armbian we're not entirely sure how to deal with IRQ affinity on RK3399 since user configurations differ. Quoting myself: 'most probably the best idea is to switch from static IRQ affinity set at boot by armbian-hardware-optimization to a daemon that analyzes IRQ situation every minute and adopts then dynamically the best strategy'. Anyway: playing with these fast cards is an interesting example for an Active Benchmarking attempt needed
  3. OMG: They managed to screw up their proprietary ThreadX crap again. The log shows throttling happening but when querying 'vcgencmd get_throttled' we get a wrong answer with latest ThreadX release: Querying ThreadX on RPi for thermal or undervoltage issues: 100000000000000010 ||| |||_ under-voltage ||| ||_ currently throttled ||| |_ arm frequency capped |||_ under-voltage has occurred since last reboot ||_ throttling has occurred since last reboot |_ arm frequency capped has occurred since last reboot They return '100000000000000010' (18 bits) but it should read '0100000000000000010' (19 bits and shifted by one position). The closer you get to the Raspberry Pi the more ignorance and stupidity is involved And yeah, results as expected. Their latest 'incremental update' (the RPi 3 B+) was faster than the predecessor only for some months. They happily ruined performance on the RPi 3 B+ recently with their latest ThreadX update starting to throttle already at 60°C to masquerade instability issues a few RPi users suffer from.
  4. Yes, that's then the only available node also labeled correctly but obviously returning a bogus value (0°C). So it needs a kernel fix and can't be fixed in my script Wrt your special OpenSSL version... if your binary is at a special location simply adjust the PATH variable. If it's at /usr/local/foo/bar/ then please try export PATH=/usr/local/foo/bar/:$PATH sbc-bench.sh Curious about the numbers. You reported already some last year but back then numbers were way lower especially with small data chunks.
  5. This is by intention to prevent high background activity destroying benchmark results. The reality out there is that users fire up their SBC after some time for benchmarks, then 'unattended-upgrades' task runs in the background utilizing one CPU core with 100% installing outstanding updates. If a benchmark would now start the scores would be 'mysteriously' lower than they should be. The web is full of such BS numbers. So this is a rather simple mechanism to prevent this. My recommendation would be a reboot in between to better spot if swapping occured while running the benchmarks (after a reboot you can simply compare both 'Swap' lines in detailed output) To elaborate on the RPi situation: The Linux kernel has no idea at which clockspeed CPU cores are running. Linux' cpufreq driver talks via a mailbox interface to the primary OS (ThreadX -- a closed source RTOS controlling the RPi hardware) and RPi Trading folks decided to let the ThreadX counterpart lie all the time. That's why Linux is reporting running at 1200 MHz, 1400 MHz or even with your overclocking experiments at 1570 MHz while in reality the ARM cores are either throttled (heat based) or 'frequency capped' (undervoltage based -- when input voltage drops below 4.63V ThreadX downclocks this and that and cuts supply voltage to the CPU cores) To get an idea at which clockspeed the ARM cores are running at the moment we always need to execute 'vcgencmd measure_clock arm' ('vcgencmd' translates to 'VideoCore generate command'). But we would need to execute this every milliseconds to really get an idea whether throttling or frequency capping is happening right now. Impossible In the past we could at least query the 'firmware' running on VC4 to tell the truth. The weapon of choice is again called 'vcgencmd': When we executed 'vcgencmd get_throttled' we get an obscure hex value we can convert into binary and then we can read 19 bits for which the first 3 and last 3 bits are somewhat documented (see below the excerpt from your undervoltage log) Things have changed and with most recent ThreadX update RPi Trading employees decided to cheat a little bit more. Now when the SoC's temperature exceeds 60°C then silent throttling will happen on the RPi 3 B+. They decrease supply voltage to the CPU cores and limit max cpufreq to 1200 MHz but do not set the respective 'throttling' bits so we can not rely on 'vcgencmd get_throttled' any more. Details: https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=217056#p1334921 Querying ThreadX on RPi for thermal or undervoltage issues: 1010000000000000101 ||| |||_ under-voltage ||| ||_ currently throttled ||| |_ arm frequency capped |||_ under-voltage has occurred since last reboot ||_ throttling has occurred since last reboot |_ arm frequency capped has occurred since last reboot
  6. Thanks a bunch for the results. No need to re-test since everything fine ('neon' or not seems to make no difference on armhf -- most probably compilation breaks). The benchmark was running all the time at 1300 MHz and this value is true: Cpufreq OPP: 1300 Measured: 1298.347/1297.092/1298.054 Also no need to re-test with different storage since the benchmark does not access the disk (only if swapping would've occured). See the '%io' column: no %iowait so everything fine. The wrong thermal readout is most probably due to my script choosing wrong thermal source (never dealt with a MediaTek SoC/kernel so far). Can you run the following please: find /sys -name "thermal_zone*" | while read ; do echo "${REPLY}: $(cat ${REPLY}/type) $(cat ${REPLY}/temp)" done
  7. I used some of your results over here: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md But the numbers are partially problematic since distros were too old (only Debian Stretch or Ubuntu Bionic provide a reliable environment), sometimes strange combinations occured (e.g. FriendlyELEC's image for NanoPC T3+ being armhf and not arm64) and kernels were rather old too. In case time permits it would be great if you check out most recent v0.4 version of sbc-bench and then try it on the following boards with the following images: NanoPC T3+: https://dl.armbian.com/nanopct3plus/Debian_stretch_next.7z ODROID-C2: https://dl.armbian.com/odroidc2/archive/Armbian_5.54_Odroidc2_Debian_stretch_next_4.17.9.7z ODROID-XU4: https://dl.armbian.com/odroidxu4/archive/Armbian_5.45_Odroidxu4_Debian_stretch_next_4.9.99.7z and http://com.odroid.com/sigong/blog/blog_list.php?bid=198 (would be interesting whether crypto performance differs since HK's kernel should make use of XU4's proprietary crypto engine) Repeating the tests on RPi 3 B+ would also be great (one time with fan, one time without and one time with a crappy Micro USB cable to show undervoltage with ThreadX clocking down to 600 MHz)
  8. Personally no experiences with LXD so far so I would suggest trying it out yourself (and using most recent distro versions, that's Ubuntu Bionic or Debian Stretch). Debian/Ubuntu care about the kernel only on x86/amd64 platforms, on ARM this is provided by image creator (or Armbian in this case -- many other image creators also rely on our kernel repositories). Jessie is EOL and should be updated to Stretch especially on arm64 since the latter platform doesn't receive any security updates any more (armhf still receives them):
  9. Latest commit added @wtarreau's great mhz tool to the reports to spot strange things happening (especially on Raspberry Pi and with Amlogic SoCs): Example output from a Rock64: Checking cpufreq OPP: Cpufreq OPP: 408 Measured: 400.981/400.762/400.857 Cpufreq OPP: 600 Measured: 592.858/592.944/592.672 Cpufreq OPP: 816 Measured: 808.872/808.932/809.031 Cpufreq OPP: 1008 Measured: 1000.598/1000.816/1000.416 Cpufreq OPP: 1200 Measured: 1193.027/1192.765/1193.027 Cpufreq OPP: 1296 Measured: 1288.983/1285.487/1288.694 Cpufreq OPP: 1392 Measured: 1385.218/1384.623/1384.995 And from a RockPro64: Checking cpufreq OPP for cpu0-cpu3: Cpufreq OPP: 408 Measured: 406.192/406.314/406.319 Cpufreq OPP: 600 Measured: 598.053/598.195/598.344 Cpufreq OPP: 816 Measured: 814.302/814.292/814.001 Cpufreq OPP: 1008 Measured: 1006.214/1006.239/1006.214 Cpufreq OPP: 1200 Measured: 1197.827/1198.355/1198.369 Cpufreq OPP: 1416 Measured: 1414.209/1414.286/1414.023 Checking cpufreq OPP for cpu4-cpu5: Cpufreq OPP: 408 Measured: 406.563/406.592/406.636 Cpufreq OPP: 600 Measured: 598.649/598.310/598.581 Cpufreq OPP: 816 Measured: 814.583/815.065/814.663 Cpufreq OPP: 1008 Measured: 1006.509/1006.558/1006.570 Cpufreq OPP: 1200 Measured: 1198.494/1198.564/1198.591 Cpufreq OPP: 1416 Measured: 1414.612/1414.596/1414.534 Cpufreq OPP: 1608 Measured: 1606.477/1606.577/1606.677 Cpufreq OPP: 1800 Measured: 1798.487/1798.587/1798.627 These checks are done twice: At the start of the benchmark when the system is idle and again directly after the most demanding test has finished and the CPUs are heated up to the max (7-zip or cpuminer based on 'sbc-bench' vs. 'sbc-bench neon'). Results made with RPi 3, 3+ and Vim2 might look really funny then
  10. tkaiser

    RockPro64

    Great! Hope you provide performance numbers for both the Intel and Mellanox here. Pine folks currently plan on adding a Tehuti based 10GbE card to their shop (less than 100 bucks IIRC) and also on RockPro64 based clustering with 24 and 48 units (on 2RU)
  11. As part of Armbian? Are you kidding? The Banana folks still do not provide 'documentation' or 'information', they still allow their copy&paste monkey to assemble random words and numbers to silly nonsense. Look at http://wiki.banana-pi.org/Banana_Pi_BPI-R64 (archived version). What's available on the mPCIe socket? USB? PCIe? How large is the board here? '148 mm × 100.5mm' which is somewhat different than the '120mm ×100mm' they list there (archived version). They're not even able to tell anyone something that simple as dimensions of their boards. They are not able to improve for whatever strange reasons. They simply don't give a shit about anything that would be important for externals to support their hardware. Nothing has changed within the last 3 years. It's unbelievable. Next problem: Supporting a new SoC requires tons of efforts. Working on a SoC that is only used on one specific board is somewhat moronic especially if the board maker has no interest in supporting external communities (providing accurate documentation is most basic requirement). Let's have a look at SinoVoip's SoC choices: Allwinner A83T: Used on BPi M3 and not a single other SBC out there Allwinner A33: Used on BPi M2 'Magic' and just one other board (Olimex A33-OLinuXino) Allwinner R40: Used on BPi M2 'Ultra' and not a single other SBC out there Allwinner V40: Used on BPi M2 'Berry' and not a single other SBC out there MediaTek MT7622: Used on BPi R64 and not a single other SBC out there MediaTek MT7623: Used on BPi R2 and not a single other SBC out there Realtek RTD1296: Used on BPi W2 and not a single other SBC out there They seem to be obsessed by the idea to choose exotic SoCs to ensure software support will be crappy. Why joining this club? Their SoC choices require tons of efforts for no reason.
  12. @malvcr or @chwe: are you able to provide 'sbc-bench neon' output for BPi R2? Anyone here with a RPi 3 B+ able to run 'sbc-bench neon' with this board one time allowing for throttling (pretty easy since operation without fan should be sufficient) and one time allowing for undervoltage/frequency capping (using any crappy Micro USB cable combined with a 2A USB wall wart)? @NicoD maybe? A screenshot from a putty session showing the output would be great...
  13. In fact I strongly disagree . Hardware based RAID was always a problem (RAID controller being a proprietary single point of failure) and primitive RAID-1 is just a joke today: https://forum.openmediavault.org/index.php/Thread/18637-Home-NAS-build-FS-info/?postID=146935#post146935 Wrt R64... you know SinoVoip is just choosing random SoCs, then designing a board around and thinking the job will be done? I've no idea why they're doing so instead of focusing on SoCs they already support. But anyway: at least this approach (always choosing a new SoC on every new board they throw out) ensures initial software support will be crappy. Seems they think that would be something desirable. Maybe the SoC can only cope with 1 GB RAM max since targeted at 'routers, home automation gateways, wireless audio and storage'? Who cares...
  14. Hi all, sbc-bench is now on Github: https://github.com/ThomasKaiser/sbc-bench I'll link from the README there to this thread for further discussion about the tool and proper benchmark methodology.
  15. I fixed the XU4 warnings and added cpuminer support (most demanding benchmark since making use of heavy NEON optimizations). Please only use the new version from Github any more: wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh /bin/bash ./sbc-bench.sh neon or wget -O /usr/local/bin/sbc-bench.sh https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh chmod 755 /usr/local/bin/sbc-bench.sh /usr/local/bin/sbc-bench.sh neon I'll comment on your results soon, there's quite some interesting stuff contained (e.g. your FriendlyELEC image being armhf instead of arm64 and so on)
  16. tkaiser

    RockPro64

    For testing purposes I always just grab a bionic or stretch minimal image from ayufan, then look for latest mainline kernel available and install it: apt-cache search kernel | grep -- '-4\.1' apt install linux-image-4.18.0-rc5-1048-ayufan-g69e417fe38cf reboot Mainline kernel was a requirement to use PCIe in the past but according to release notes that might have changed recently (by disabling SDIO). So you might want to test with both 4.4 and mainline... Edit: Don't forget to always check /proc/interrupts for the PCIe related IRQs and assign them to one of the big cores.
  17. I pushed the script to Github. From now on the (persistent) URL is: https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh (no need to convert crlf to lf any more)
  18. I've seen temporary failures as well when accessing http://ix.io -- guess I have to rework the upload routine to try it even more times (currently upload will be tried twice then given up on) A simple check would be echo Hello world. | curl -F 'f:1=<-' ix.io
  19. Please https://pastebin.com/raw/CXtt28y1 instead and no need for dos2unix since this can simply be achieved by stripping out the CR characters ('\015') as shown above using tr. Vim2 will be very interesting since strange things happen there. Really curious about the results. NanoPC T4 results are above (but with conservative settings limiting CPU cores to 1.8/1.4GHz instead of 2.0/1.5GHz we'll use later) and NanoPC-T3+ will be more or less the same as NanoPi Fire3 since same SoC but different amount (and maybe type) of DRAM. And yeah, XU4 is also interesting.
  20. Interesting, thanks. Still throttling happened since you hit the 60°C treshold multiple times, see the 7-zip results from 3 consecutive runs: 3907,3629,3549 (declining) and 11:34:15: 1570/1200MHz 3.57 83% 1% 82% 0% 0% 0% 60.1°C 1.3312V 11:35:00: 1570/1200MHz 3.90 80% 1% 79% 0% 0% 0% 59.6°C 1.3312V You would nee to add 'temp_soft_limit=70' to /boot/config.txt and reboot to get back 'old behaviour' so silent throttling starts as 70°C as prior to 'Jul 3 2018 14:15:46' (that's the timestamp of latest ThreadX update that destroys performance on every RPi 3+ around).
  21. In the meantime I improved the script in some areas (especially throttling/undervoltage warnings). To get rid of CRLF it's as easy as wget -O - https://pastebin.com/raw/CXtt28y1 | tr -d "\015" >/usr/local/bin/sbc-bench.sh chmod 755 /usr/local/bin/sbc-bench.sh sudo /usr/local/bin/sbc-bench.sh Tested already with a bunch of boards (all the time in situations with active cooling to test for stuff like kernel differences or architecture): Raspberry Pi 2, kernel 4.14, default RPi settings, not throttled: http://ix.io/1ivw NanoPC T4, kernel 4.17, preliminary settings, not throttled: http://ix.io/1ivB RockPro64, kernel 4.18, ayufan/arm64 settings, not throttled: http://ix.io/1iw5 RockPro64, kernel 4.4, ayufan/arm64 settings, not throttled: http://ix.io/1ivR NanoPi Fire3, kernel 4.14, Armbian settings, not throttled, zram swapping: http://ix.io/1ivC Clearfog Pro, kernel 4.14, Armbian settings, not throttled: http://ix.io/1ivE Rock64, kernel 4.4, Armbian settings, not throttled: http://ix.io/1ivG Rock64, kernel 4.4, ayufan/armhf settings, also 1392 MHz, not throttled: http://ix.io/1iwz The interesting stuff as follows: When comparing the RK3399 boards (NanoPC T4 and RockPro) kernel version makes a huge difference wrt memory bandwidth/latency which also results in different 7-zip scores arm64 vs. armhf (Rock64) is not that much of an issue. The armhf binary is slightly slower but on the other hand an armhf userland can cope with less available physical memory NanoPi Fire has 8 CPU cores but just 1 GB DRAM which results in a big problem with almost all workloads that would benefit from 'as much CPU cores as possible'. As a result swapping happens. With recent Armbian not that much of a problem since we switched from SD card based emergency swap to zram which works pretty well. But when running sbc-bench with a different distro relying on swap numbers might be much lower since storage becomes the bottleneck (TBC). I'll push the script plus explanations on Github over the weekend and create an own thread for the tool.
  22. No, please. The 'rant' highlights what's problematic when benchmarking boards: the existence of an operating system the average user doesn't know about (since being told ThreadX would just be some 'firmware') and how updates of this primary OS can affect performance behaviour and how hard it is to monitor this stuff to get a clue why performance differs when running this OS image vs. that OS image (especially keeping in mind that those secondary Linux operating systems pull in updates for the primary OS that change whole system behaviour) Let's keep this stuff collected here. When I soon start some sort of a tutorial how to benchmark correctly I will reference some posts here.
  23. To sell more of these devices making nice profits? The average RPi user is pretty clueless so all that's needed to sell a new 'incremental update' is mentioning that it's faster. In fact the 3 B+ was a little bit faster for some months (1.4 GHz vs. 1.2 GHz and way better PCB design plus heatspreader resulted in higher sustained performance). Now that all the benchmarks are published they silently reverted the higher performance since everything demanding that would need the 1.4 GHz will now trigger the 60°C throttling treshold easily. But hey, RPi users won't realize since /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq shows only bogus numbers. Same when undervoltage occurs. In such a situation (input voltage dropping below 4.65V which happens very very very often with Raspberries not using their 'special PSU' but standard Micro USB gear) the ARM cores are immediately downclocked to 600 MHz while /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq happily lies about 1400 MHz on the 3 B+ or 1200 MHz on the 3 B. BTW: The VC4 is a 2010 design and nothing except exchanged ARM cores has changed. They have nothing else. If they would switch to a new SoC backwards compatibility wouldn't exist any more. I wouldn't be surprised if we see 2019 a last incremental update (then using an eMMC socket, implementing SDR104 mode for faster SD card access and Wi-Fi with 2x2 MIMO and real antennas) and in 2020 they're simply telling 'game over'.
  24. They silently changed this few weeks ago with a new ThreadX release (the primary OS they call 'firmware'). BTW: I added monitoring of ThreadX settings (search for config.txt in http://ix.io/1isD output) and saw that you didn't use any overclock settings. Might be interesting whether tuning of DRAM settings also gets reverted when SoC temperature exceeds 60°C now.
  25. Thank you! So the new RPi 3 B+ with latest updates applied silently downclocks even when there's just boring tinymembench running: System health while running tinymembench: Time fake/real load %cpu %sys %usr %nice %io %irq CPU VCore 16:56:39: 1400/1400MHz 0.28 14% 0% 12% 0% 0% 0% 60.1°C 1.3250V 16:57:39: 1400/1200MHz 0.67 21% 0% 21% 0% 0% 0% 62.8°C 1.2313V Would be funny to repeat the test this time with fan active since I would believe the RPi clowns do not ony downclock CPU cores but most probably also GPU, VPU and DRAM. A second test with fan should clarify. Edit: already provided by @NicoD in the meantime Upstream Ubuntu armhf packages are build with another GCC version and different compiler switches (for ARMv7) while Raspbian builds everything for ARMv6 to support their single core boards too. But Raspbian uses more aggressive compiler switches so that some code (e.g. the funny sysbench joke) performs better with the Raspbian ARMv6 binary compared to an upstream Debian or Ubuntu armhf package: see sysbench pseudo benchmark numbers made with my OMV images for RPi (using an Armbian armhf userland combined with the proprietary RPi stuff): https://forum.armbian.com/topic/1748-sbc-consumptionperformance-comparisons/?page=2 Way more important what everyone ignores: the Raspberry Pi is NOT an ARM SBC like all the other boards we're using. It's a VideoCore IV (VC4) SBC with some crappily integrated ARM cores. The VC4 is the primary CPU and runs a closed source RTOS called ThreadX that fully controls the hardware. The ARM cores are just guest processors (called 'third class citizens' by the lady who tried to develop an open source replacement for the proprietary ThreadX stuff) and are only able to run a secondary OS like e.g. Linux that has not even a clue at which clockspeeds it's running 4 weeks ago the RPi clowns decided to release a new ThreadX release which contains a significant change: as soon as the SoC temperature exceeds 60"C on the RPi 3 B+ some subsystems will be silently downclocked. Since they're cheating you can't realize that by querying the usual sysfs node. In the past it was possible to spot this cheating by 'vcgencmd get_throttled' which reported throttling (and also frequency capping and undervoltage) since last reboot. Now they cheat even more and with this first clock reduction from 1.4 GHz to 1.2 GHz the relevant throttling bit will not be set any more. In other words: 4 weeks ago the vast majority of RPi 3 B+ out there was a bit faster compared to after applying latest updates. The closed sourced main OS ThreadX is available to us only as BLOBs living on the FAT partition below /boot (on RPi OS images it's the 'raspberrypi-bootloader' package pulled in from archive.raspberrypi.org). This is a typical 'commit' (exchanged BLOBs no one outside RPi Trading and Broadcom can look into): https://github.com/raspberrypi/firmware/commit/0bef3cb16d600292d4185796cc042fd564bc694d The whole hardware initialization as well as everything that's performance relevant happens in ThreadX, the ways to monitor what's really happening when looking from the secondary OS (Linux) are crippled (since mailbox driver is cheating and reporting fantasy clockspeeds) so on this VC4 platform it's even more important to permanently monitor as good as possible what's happening. Since benchmarking without checking what's really happening is only generating numbers without meaning. TL;DR: RPi clowns decided few weeks ago to trash performance of all RPi 3 B+ out there to address the instability problems some board owners suffer from. Problem as well as workaround to get back old behaviour described here: https://www.raspberrypi.org/forums/viewtopic.php?f=63&amp;t=217056#p1335342
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines