tkaiser Posted August 28, 2016 Posted August 28, 2016 We have initial support for Pine64/Pine64+ for a long time in our repository but not released any official images yet. Since this will change soon a sneak preview what to expect. Hardware related issues: Please don't blame Armbian for the few design flaws Pine64 and Pine64+ show: These boards use Micro USB for DC-IN which is the worst possible decision. Most USB cables have a resistance way too high and are responsible for severe voltage drops when consumption increases, then the tiny Micro USB contacts have also a pretty high contact resistance and also maximum amperage for this connector is limited to 1.8A by USB specs. So in case you want to do heavy stuff immediately look into linux-sunxi wiki page for Pine64 to get the idea how to use the pins on the so called Euler connector to power the board more reliably. If you think about buying a Pine now consider ordering their PSU too since there cable resistance shouldn't be a problem (should also apply to the Micro USB cables they sell) The only led on this board is a power led that immediately lights when power is provided. Pre-production samples had a green led, on the normal batches this has been replaced with a red led. So there's no way for an OS image to provide user feedback (activate an led when u-boot or kernel boots) and the red light has often been interpreted as 'something is wrong' USB: you find 2 USB type A receptacles on the board but only one is a true USB host port, the other/upper is A64's USB OTG port exposed not as Mini/Micro USB (with ID pin to be able to switch roles) but as a normal type A port. Expect performance to be lower on this port. I've also never been able to do disk benchmarking on the upper port but that might have changed in the meantime (I only have a pre-production developer sample here). Please note also that the maximum amperage available on the USB port is 650mA so connecting bus-powered USB disks might already exceed this so be prepared to use a powered USB hub in between A64 is prone to overheating but unfortunately the Pine64 folks do not sell the board with an effective heatsink by default (compare with ODROID-C1+ or ODROID-C2 for example how it looks like if the vendor cares about heat dissipation). They promised to provide a good heatsink as option but at least I'm not able to find one in their online store. But a heatsink is mandatory if you plan to run this device constantly with high loads, otherwise throttling will occur (when we tested an unrealistic heavy workload without a heatsink -- cpuburn-a53 -- A64 had to throttle down to as less as 600 MHz (for some numbers see IRC log from a while ago) Not a real hardware issue but a problem anyway: the HDMI driver in Allwinner's BSP does not negotiate any display output with a lot of displays that are connected with a HDMI <--> DVI converter or use non-common resolutions. Better do not expect any display output if your display is neither connected directly using HDMI nor capable of 1080p (we can't do anything here since Allwinner's driver uses closed source blobs and no documentation or code with useable license exists) On a couple of Gbit equipped Pine64+ users report that they're not able to negotiate Gbit Ethernet reliably and have to force the connection to Fast Ethernet (since we know that the RTL8211E PHY used on the boards needs an additional ~350 mW when negotiating a Gbit Ethernet connection this might be related to power problems or maybe different PHY batches or something else). Confirmed in the meantime to be a hardware issue. Now combine Micro USB encouraging users to combine this SBC with crappy phone chargers, 'smart' hubs/chargers that do only provide 500mA since Pine64 isn't able to ask for more and crappy USB cables leading to voltage drops (all sorts of power related issues 'by design' due to crappy Micro USB connector) with a missing custom led able to be used to provide user feedback while booting and the inability to use a lot of displays then you might already get what a support nightmare this device is. The only reliable DOA detection method without a serial console is to ensure you have a working SD card (test it before using either F3 or H2testw as outlined in our docs), then check download integrity of the Armbian image (again see the documentation), then ensure you burn the image correctly to SD card (see docs), insert SD card, power on the board and wait 20 seconds. If then the leds on the Ethernet jack start to flash randomly at least the kernel boots and after waiting an additional 2 minutes you'll be able to login with SSH or serial console (for the latter better choose the EXP header over the Euler connector -- reason here) Anyway: In case you run in booting or stability problems with Armbian on Pine64/Pine64+ be assured that it's not an Armbian issue. You run into any of the problems above therefore please try to resolve them on your own and send your complaints to Pine64 forum and not ours: http://forum.pine64.org/forumdisplay.php?fid=21 (really, we don't do hardware and these issues are all related to hardware design decisions) Expectations: The Pine64 folks did a great job raising expectations to the maximum. They advertised this board as 'first $15 64-Bit Single Board Super Computer', promised an average consumption of just 2.5W, the SoC remaining at 32°C and a few other weird things while they already knew that reality differs a lot (the journey started here last Dec). Pine64 is not a 'Super Computer' but most probably the slowest 64-bit ARM board around due to A64 being limited regarding maximum cpufreq and overheating issues (40nm process being responsible for) and lack of fast IO interconnections (only one real USB 2.0 host port present, no eMMC option possible, no SD card implementation using the faster modes). If you then combine the high expectations with a rather clueless kickstarter crowd (many of them not even getting that they did not buy products but backed a project) and the hardware flaws it's pretty obvious why their forums are full of complaints and why they receive so much boards as being DOA that work flawlessly in reality. So why bringing Armbian to Pine64? Since for some (headless) use cases these boards are really nice and also cheap, A64 support is progressing nicely thanks to our awesome linux-sunxi community and also a few more A64 devices will be available soon. What do you get with Armbian on Pine64? User experience will not be much different compared to longsleep's minimal Ubuntu image. If you prefer Debian then at least you can be assured that our images do not contain bad settings and silly bugs like the one's from official Pine64 downloads section (since they fiddle around manually with their OS images for example all Pine boards running these have the same MAC address by default which will cause network troubles if you've more than one board in the same collision domain). We use the same thermal/throttling settings like OS images based on longsleep's kernel (since we helped developing them back in March), we use the same BSP kernel (patched by Zador up to the most recent version back in May) and share a few more similarities since our modifications were sent back to longsleep so all OS images for Pine64 might be able to benefit from. Differences: You don't need to execute longsleep's various platform scripts since kernel and u-boot updates are done using the usual apt-get upgrade mechanism in Armbian. You also don't need (and should not use) scripts like pine64_tune_network.sh since they decrease network performance with Armbian (stay with our defaults unless you're an expert). And a few more tweaks might result in better performance and at least by using Armbian you have the usual Armbian experience with some additional tools at the usual location, automatic fs resize on first boot and so on. We already provide a vanilla image currently based on kernel 4.7 but that's stuff for developers only, see below. Performance with legacy Armbian image: 'Out of the box' CPU performance with A64 is not that great unless you are able to benefit from the new CPU features: A64 uses Cortex-A53 CPU cores that feature 64-bit capabilities (which are not that interesting since A64 devices are limited to 2 GB DRAM anyway at the moment) but more interestingly ARMv8 instruction set can be used which might increase performance a lot when software will be compiled for this platform. Best example: the commonly mis-used sysbench cpu test: When running an ARMv6 'optimized' sysbench binary on an ARMv8 CPU then performance will be 15 times slower than necessary (applies to the RPi 3 or the upcoming Banana Pi M64 when used with their OS images) But as soon as ARMv8 optimized code is used A64 can really shine in some areas. I used the default sysbench contained in Ubuntu Xenial's arm64 version, tried it with 20000 settings and got less than 8 seconds execution time (an RPi 3 running Raspbian has the faster CPU cores but here it will take 120 seconds -- just due to different compiler switches!). Then I tried whether I can optimize performance building sysbench from source using export AM_CFLAGS="-march=armv8-a -mtune=cortex-a53" and got 11 seconds execution time, so optimized code led to a huge performance loss? Not really, I checked out sysbench version 0.5 by accident and there for whatever reasons execution with ARMv8 optimization or in general takes longer (great! benchmark version influences execution time, so one more reason to never trust in sysbench numbers found on the net!). Using the '0.4' branch at version 0.4.12 I got an execution time of less than 7.5 seconds which is a 10 percent increase in performance for free just by using appropriate compiler flags: root@pine64:/# /usr/bin/sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 20000 Test execution summary: total time: 7.9788s total number of events: 10000 total time taken by event execution: 31.8939 per-request statistics: min: 3.17ms avg: 3.19ms max: 8.74ms approx. 95 percentile: 3.19ms Threads fairness: events (avg/stddev): 2500.0000/3.54 execution time (avg/stddev): 7.9735/0.00 root@pine64:/# /usr/local/src/sysbench-0.4.12/sysbench/sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Random number generator seed is 0 and will be ignored Doing CPU performance benchmark Primer numbers limit: 20000 Threads started! Done. General statistics: total time: 7.4608s total number of events: 10000 total time taken by event execution: 29.8223 response time: min: 2.96ms avg: 2.98ms max: 8.51ms approx. 95 percentile: 2.99ms Threads fairness: events (avg/stddev): 2500.0000/3.67 execution time (avg/stddev): 7.4556/0.00 root@pine64:/# /usr/local/src/sysbench/sysbench/sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Random number generator seed is 0 and will be ignored Prime numbers limit: 20000 Initializing worker threads... Threads started! General statistics: total time: 11.0451s total number of events: 10000 total time taken by event execution: 44.1223s response time: min: 4.38ms avg: 4.41ms max: 27.34ms approx. 95 percentile: 4.41ms Threads fairness: events (avg/stddev): 2500.0000/6.36 execution time (avg/stddev): 11.0306/0.01 Another great example how using CPU features or not (NEON in this case) influences performance and 'benchmarking gone wrong' numbers are Linpack's MFLOPS scores. By choosing the package your distro provides instead of using one that makes use of your CPU's features you loose at lot of performance, ruin every performance per watt ratios and behave somewhat strange Someone sent me Linpack MFLOPS numbers generated with Debian Jessie which is known for horribly conserative compiler settings when building packages -- if you switch your distro from Jessie to Ubuntu Xenial for example you get a 30 percent improvement in sysbench numbers, yeah that's the 'benchmark' we already laughed at above. With Jessie's/Raspbian's hpcc package, Pine64+ gets a score of 1625 MFLOPS and RPi 3 just 1035. So is Pine64 1.6 times faster than RPi 3? Nope, that's just 'benchmarking gone wrong' since these numbers are the result of a joke: Using tools for 'High performance computing' with standard settings (no one interested in HPC would ever do that). By using the correct Linpack version that makes use of NEON optimizations on both CPUs we end up with 3400 MFLOPS (Pine64 at 1.3 GHz) vs 3600 MFLOPS (RPi 3 at 1.2 GHz). So if we're talking about this use case (HPC -- high performance computing) RPi 3 easily outperforms A64 (please keep in mind that the 3400 MFLOPS I got are the result of overclocking/overvolting at 1296 MHz, Pine64 is limited to 1152 MHz by default so we're talking about 3000 MFLOPS for A64 vs. 3600 MFLOPS for RPi 3's SoC. So it's not Pine64 being 1.6 times faster but RPi 3 being more suited for Linpack numbers and this type of benchmarks only shows how wrong it is to use distro packages that are built using conservative settings (which is a must if the distro wants to support a wide range of different SoCs!) Anyway: I's obvious that in case you want to use Pine64 for number crunching or performance stuff in general evaluating whether compiling packages from source might improve performance is a great idea (at least it's obvious that from a performance point of view using an ARMv6 distro with ARMv8 SoCs is stupid -- reality with Raspbian running on RPi 3 and BPi M64). ARMv8 also provides crypto extensions that might be used with OpenSSL for example. Didn't look into it yet but maybe huge performance gains when using a Pine64 as HTTPS enabled web server or VPN endpoint are possible just like we've already seen with sysbench. Network performance: Pine64+ combines the SoC internal GbE MAC implementation (the same as in H3 and A83T SoCs from Allwinner) with an external RTL8211E PHY as used on most GbE capable SBC. Default iperf performance with Armbian/Xenial: +900 MBits/sec in both directions (920/940 MHz) so no need for further tuning (please read through this explanation here why blindly trusting in iperf numbers is always stupid and why it's neither necessary nor useful to further tune network settings to get better iperf numbers). root@armbian:/var/git/Armbian# iperf3 -c 192.168.83.64 Connecting to host 192.168.83.64, port 5201 [ 4] local 192.168.83.115 port 60392 connected to 192.168.83.64 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 112 MBytes 938 Mbits/sec 0 356 KBytes [ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 0 376 KBytes [ 4] 2.00-3.00 sec 112 MBytes 943 Mbits/sec 0 376 KBytes [ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 376 KBytes [ 4] 4.00-5.00 sec 112 MBytes 938 Mbits/sec 0 376 KBytes [ 4] 5.00-6.00 sec 113 MBytes 947 Mbits/sec 0 376 KBytes [ 4] 6.00-7.00 sec 112 MBytes 940 Mbits/sec 0 395 KBytes [ 4] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 0 395 KBytes [ 4] 8.00-9.00 sec 112 MBytes 942 Mbits/sec 0 395 KBytes [ 4] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 395 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver root@pine64:~# iperf3 -c 192.168.83.115 Connecting to host 192.168.83.115, port 5201 [ 4] local 192.168.83.64 port 39363 connected to 192.168.83.115 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 0 1.05 MBytes [ 4] 1.00-2.00 sec 110 MBytes 922 Mbits/sec 0 1.24 MBytes [ 4] 2.00-3.01 sec 110 MBytes 918 Mbits/sec 0 1.24 MBytes [ 4] 3.01-4.00 sec 109 MBytes 917 Mbits/sec 0 1.24 MBytes [ 4] 4.00-5.01 sec 110 MBytes 918 Mbits/sec 0 1.24 MBytes [ 4] 5.01-6.01 sec 110 MBytes 923 Mbits/sec 0 1.24 MBytes [ 4] 6.01-7.00 sec 109 MBytes 918 Mbits/sec 0 1.24 MBytes [ 4] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 1.24 MBytes [ 4] 8.00-9.00 sec 109 MBytes 912 Mbits/sec 0 1.24 MBytes [ 4] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 0 1.24 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver Please keep in mind that for yet unknown reasons a couple of Pine64+ are reported to not reliably work at Gbit Ethernet speeds. Please also keep in mind how settings might matter. If you run a standard iperf test in 'passive benchmarking' mode you might get throughput numbers 200-250 Mbits/sec lower than ours maybe just due to a wrong cpufreq governor. Ethernet throughput scales linearly with CPU clockspeed with most cheap ARM SoCs (our only known exception from this is Solid-Run's Clearfog which uses a SoC optimized for IO and network throughput) so by using the ondemand governor with wrong/default settings for example you ensure that an idle SBC will only slowly increase clockspeed when you start your iperf test. This is Armbian switching from interactive to ondemand governor now being below 700 Mbits/sec just due to adjusting CPU clockspeed too slow: root@pine64:~# iperf3 -c 192.168.83.115 Connecting to host 192.168.83.115, port 5201 [ 4] local 192.168.83.64 port 39365 connected to 192.168.83.115 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.02 sec 47.9 MBytes 395 Mbits/sec 1 99.0 KBytes [ 4] 1.02-2.02 sec 55.0 MBytes 459 Mbits/sec 0 132 KBytes [ 4] 2.02-3.01 sec 60.3 MBytes 511 Mbits/sec 0 151 KBytes [ 4] 3.01-4.01 sec 91.2 MBytes 769 Mbits/sec 0 170 KBytes [ 4] 4.01-5.01 sec 96.2 MBytes 804 Mbits/sec 0 182 KBytes [ 4] 5.01-6.01 sec 96.2 MBytes 806 Mbits/sec 0 191 KBytes [ 4] 6.01-7.01 sec 96.2 MBytes 808 Mbits/sec 0 195 KBytes [ 4] 7.01-8.01 sec 96.2 MBytes 808 Mbits/sec 0 197 KBytes [ 4] 8.01-9.00 sec 95.0 MBytes 805 Mbits/sec 0 198 KBytes [ 4] 9.00-10.00 sec 97.5 MBytes 815 Mbits/sec 0 199 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 832 MBytes 698 Mbits/sec 1 sender [ 4] 0.00-10.00 sec 832 MBytes 698 Mbits/sec receiver The other stuff normally 'benchmarked' is not worth mentioning/testing it so just as quick notes: A64 is showing the same SDIO limitation as most other SoCs limiting sequential transer speeds to/from SD card to ~23MB/s (do the math yourself: SDIO with 4 bit @ 50 MHz minus some overhead is 23 MB/s) -- fortunately that's rather uninteresting since random IO matters on SBCs and there it's your choice to choose between crappy cards that horribly suck or follow our recommendations and choose a really fast card. But Pine64 can not use the faster eMMC interface so if you really need high IO bandwidth and high IOPS better choose a different device USB is USB 2.0 so expect ~35MB/s with BSP kernel and ~40MB/s with mainline kernel and UASP capable disk enclosures for individual USB connections (UASP + mainline kernel might show high random IO numbers if used together with an SSD!) HW accelerated video decoding is already possible (see here for the codec matrix) and situation with HW accelerated video encoding looks promising too: http://forum.armbian.com/index.php/topic/1855-ffmpeg-with-cedrus-h264-hw-encoder-a64-cmos-camera/ In case one is interested in performance testing on SBCs monitoring what's happening is mandatory. Currently our armbianmonitor tool does not install the necessary templates on A64 so still my script to install this stuff on A64 should be used: http://kaiser-edv.de/tmp/4U4tkD/install-rpi-monitor-for-a64.sh (read the script's header how to install) Performance with vanilla Armbian image: Not interesting at all at the time of this writing since while Pine64 happily boots mainline u-boot/kernel it's way too early to do tests in this area. Currently there's no access to the AXP803 PMIC from mainline kernel so not even VDD_CPUX voltage regulation works and as a result cpufreq scaling is also not working and the SoC is clocked pretty conservative. Since most performance relevant stuff running on cheap ARM SoCs depends on (switching as fast as possible to) high CPU clockspeeds benchmarking is absolutely useless now. You should also keep in mind that many core features still not work with mainline kernel so this is really stuff for developers (who normally prefer their own way to boot their freshly compiled kernels). So please don't expect that much from vanilla images for A64 boards now, better choose the legacy variant. The future? A few more A64 boards are announced or already available as dev samples, for example the aforementioned BPi M64 (possible advantages over Pine64: sane DC-IN, real USB OTG, more USB host ports behind an internal USB hub, eMMC available and custom leds being able to provide user feedback, everything else is more or less the same as the 2 GB Pine64+) or Olimex working on both an SBC and an A64 based Laptop. And then Xunlong announced 2 new SBC based on Allwinner's H5. H5 (product brief) seems to be A64's bigger sibling providing video/GPU enhancements, 3 true USB host ports in addition to one USB OTG (just like H3 where we can use all 4 USB ports that do not have to share bandwidth), integrating a Fast Ethernet PHY (just like H3) but lacks PMIC support (again just like H3, so no mobile useage, no battery support out of the box and it gets interesting how VDD_CPUX voltage regulation will work there -- maybe 'just like H3' again). Since A64 shares many/most IP blocks with H3 and A83T from Allwinner I still hope that H5 will be just a mixture of A64 and H3 and we will get full support based on what we now have for these 2 other SoCs pretty fast. But that's 100 percent speculation at this moment Update regarding longsleep's pine64_tune_network.sh script. Benchmark results don't get automatically worse when applying the tweaks from his script but the result variation gets huge (730 - 950 Mbits/sec, exceeding 940 Mbits/sec is already an indication that buffers are invoked): root@pine64:/home/tk# iperf3 -s ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.83.115, port 50002 [ 5] local 192.168.83.76 port 5201 connected to 192.168.83.115 port 50004 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 90.7 MBytes 759 Mbits/sec [ 5] 1.00-2.00 sec 92.2 MBytes 774 Mbits/sec [ 5] 2.00-3.00 sec 92.5 MBytes 776 Mbits/sec [ 5] 3.00-4.00 sec 92.5 MBytes 776 Mbits/sec [ 5] 4.00-5.00 sec 92.6 MBytes 777 Mbits/sec [ 5] 5.00-6.00 sec 106 MBytes 889 Mbits/sec [ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec [ 5] 7.00-8.00 sec 111 MBytes 927 Mbits/sec [ 5] 8.00-9.00 sec 101 MBytes 847 Mbits/sec [ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec [ 5] 10.00-10.02 sec 1.66 MBytes 931 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.02 sec 1007 MBytes 843 Mbits/sec 9 sender [ 5] 0.00-10.02 sec 1004 MBytes 841 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.83.115, port 50006 [ 5] local 192.168.83.76 port 5201 connected to 192.168.83.115 port 50008 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 88.4 MBytes 740 Mbits/sec [ 5] 1.00-2.00 sec 91.9 MBytes 771 Mbits/sec [ 5] 2.00-3.00 sec 109 MBytes 918 Mbits/sec [ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec [ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec [ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec [ 5] 7.00-8.00 sec 112 MBytes 942 Mbits/sec [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec [ 5] 10.00-10.02 sec 1.89 MBytes 928 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.02 sec 1.05 GBytes 904 Mbits/sec 0 sender [ 5] 0.00-10.02 sec 1.05 GBytes 902 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.83.115, port 50010 [ 5] local 192.168.83.76 port 5201 connected to 192.168.83.115 port 50012 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 87.7 MBytes 734 Mbits/sec [ 5] 1.00-2.00 sec 92.1 MBytes 773 Mbits/sec [ 5] 2.00-3.00 sec 92.2 MBytes 773 Mbits/sec [ 5] 3.00-4.00 sec 92.1 MBytes 773 Mbits/sec [ 5] 4.00-5.00 sec 92.1 MBytes 773 Mbits/sec [ 5] 5.00-6.00 sec 102 MBytes 859 Mbits/sec [ 5] 6.00-7.00 sec 93.1 MBytes 781 Mbits/sec [ 5] 7.00-8.00 sec 92.1 MBytes 773 Mbits/sec [ 5] 8.00-9.00 sec 92.1 MBytes 773 Mbits/sec [ 5] 9.00-10.00 sec 94.9 MBytes 796 Mbits/sec [ 5] 10.00-10.02 sec 1.62 MBytes 740 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.02 sec 936 MBytes 783 Mbits/sec 0 sender [ 5] 0.00-10.02 sec 933 MBytes 781 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.83.115, port 50014 [ 5] local 192.168.83.76 port 5201 connected to 192.168.83.115 port 50016 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 87.1 MBytes 729 Mbits/sec [ 5] 1.00-2.00 sec 92.1 MBytes 774 Mbits/sec [ 5] 2.00-3.00 sec 91.8 MBytes 769 Mbits/sec [ 5] 3.00-4.00 sec 90.4 MBytes 759 Mbits/sec [ 5] 4.00-5.00 sec 96.4 MBytes 808 Mbits/sec [ 5] 5.00-6.00 sec 90.2 MBytes 758 Mbits/sec [ 5] 6.00-7.00 sec 113 MBytes 951 Mbits/sec [ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec [ 5] 8.00-9.00 sec 112 MBytes 942 Mbits/sec [ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec [ 5] 10.00-10.02 sec 1.83 MBytes 932 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.02 sec 1003 MBytes 840 Mbits/sec 0 sender [ 5] 0.00-10.02 sec 1000 MBytes 837 Mbits/sec receiver So better enjoy defaults unless you really know what you do since network performance tuning works in different directions. Stuff that might increase throughput might negatively affect latency and vice versa. So if you start to tune, tune for your specific use case! 4 Quote
tkaiser Posted August 29, 2016 Author Posted August 29, 2016 A little update regarding 'network performance' since people wrote me the results with our Armbian/Xenial image look too good to be true. So back to the basics: most benchmarks you find somewhere on the internet regarding SBC performance are showing numbers without meaning since they were made in 'passive benchmarking' mode without taking care of what's important. What's different on SBCs compared to real servers (be it x86, Sparc, MIPS, ARMv8)? Network and IO performance on cheap ARM SoCs are affected by current CPU clockspeed (that's different on 'real' servers) A benchmarking tool that adds to CPU utilization on a real server in a negligible fashion might max out CPU ressources on a weak ARM SoC so the tool used for benchmarking might bastardize performance numbers itself. Also when acting single-threaded it might in reality test CPU and not network -- true for iperf in most operation modes The OS distribution might use horribly wrong settings (cpufreq governor, IRQ distribution and so on), might contain mechanisms that are counterproductive (screensavers that start after periods of inactivity and influence performance numbers massively) and might be optimized for different use cases (for example on any ARM SoC around CPU and GPU cores share access to DRAM so we already know that by disabling GPU/HDMI on headless servers we automagically improve performance due to less consumption/temperature now leading to better throttling behaviour under load and more memory bandwidth leading to higher throughput numbers for some tasks) Since we already know that common tools like iperf are CPU intensive let's try out the upper and lower clockspeed on Pine64 to get the idea how a wrong cpufreq governor might influence results (switching way too slow from lower to upper clockspeeds when starting short benchmark executions) and how bottlenecked iperf is by CPU anyway. Results as follows (please remember: this is always the same hardware and test setup, the only real difference is the OS image used!): TX RX Armbian Xenial @ 480 MHz: 630 / 940 Mbits/sec Armbian Jessie @ 480 MHz: 620 / 600 Mbits/sec pine64.pro Jessie @ 480 MHz: 410 / 595 Mbits/sec TX RX Armbian Xenial @ 1152 MHz: 920 / 940 Mbits/sec Armbian Jessie @ 1152 MHz: 920 / 810 Mbits/sec pine64.pro Jessie @ 1152 MHz: 740 / 770 Mbits/sec What does this mean? Obviously the OS image matters. Someone wanting to benchmark Pine64+ and relying on the OS images from their official download location gets 500 Mbits/sec on average while someone choosing our Xenial image gets 930 Mbits/sec on average. So let's switch from passive to active benchmarking mode (monitoring the benchmark itself) and check what's different: When using Xenial's iperf the tool acts single-threaded only in one direction (TX), with the iperf version in Jessie it's single-threaded in both directions. So by using Jessie you ensure that CPU clockspeed tampers network throughput always in both directions (iperf maxing out one CPU core at 100%) while with Xenial that happens only in TX direction. With Xenial you would also see full 940 Mbits/sec in both directions by adjusting maximum cpufreq from 1152 to 1200 MHz. The performance numbers also show that compiler switches do not affect iperf performance that much when comparing Xenial with Jessie (with other tools like sysbench you get a whopping 30 percent better numbers on Xenial compared to Jessie) The differences between Armbian Jessie and the one from pine64.pro are: my Jessie image is headless while pine64.pro runs a desktop and more importantly that the pine64.pro OS image uses the wrong cpufreq governor (ondemand instead of interactive) which affects standard iperf/iperf3 test execution using the 10s defaults. When using longer test executions benchmark numbers improve but this is obviously the wrong way to handle the problem -- switching to interactive is the real solution With iperf in single-threaded mode you will also see performance differences caused by CPU affinity. If for whatever reasons the kernel decides to assign the iperf task to cpu0 performance numbers might differ compared to running on cpu1 (depends on IRQ distribution). So to get really a clue what's going on you have to monitor IRQ distribution (/proc/interrupts) and assign the benchmark tool to a specific core using taskset to get the idea whether tool and IRQ handling on the same core improve performance or not (since then you could decide whether improving IRQ distribution is worth a try) What do these numbers tell us? iperf/iperf3 are both not able to measure network througput realiably on SBCs since they are too much CPU bound (as used by most people using default window sizes the results are useless anyway) on the other hand that means that actual CPU clockspeeds matters a lot and therefore choosing an inappropriate cpufreq governor results in worse performance (simple solution: switch to performance or interactive with Allwinner's BSP kernel or schedutil on mainline kernel 4.7 or later) Also inapproriate heat dissipation leads to benchmarks showing lower network performance so in case you cramped your Pine64 in a tiny enclosure without using a heatsink you ensure that performance sucks iperf seems to perform better when using Xenial since it behaves differently there (maybe caused by Jessie compiling distro packages with GCC 4.9 while Xenial uses GCC 5.4 instead). So you get better performance numbers by switching from Jessie to Xenial but it's important to understand that this ONLY affects meaningless benchmark numbers! Real world workloads behave differently. Don't trust in any benchmark blindly iperf when used with default settings (that's 10 seconds test execution here) might show weird/random numbers since we have to deal with a a phenomenon called TX/RX delay and cpufreq scaling governor chosen adds to random behaviour. With iperf you should always test 5 times with 10 seconds, then 1 x 60 seconds and 1 x 300 seconds since then you immediately get the idea why ondemand is wrong for this workload and how the other phenomenon might influence results. Use iperf3 also since this outputs 1 second statistics and might report different numbers (so you learn at least to not blindly trust into benchmark tools!) Most important lesson: Try to understand how these iperf/iperf3 numbers correlate with reality and then measure/monitor real world workloads. If you let Pine64 run as a web server for example with Jessie from pine64.pro then with light workloads Pine64 will be magnitudes slower than when using Armbian (remaining at 480 MHz vs. jumping to 1152 MHz when needed)... or switch to interactive governor with the pine64.pro image. The numbers above do not tell you this real difference for this specific use case. It's always important to try to understand what a benchmark actually tells you We at Armbian should ask ourselves whether we will provide desktop images for Pine64 at all (please, please not, let's prevent that our forum gets flooded with all the HDMI and DVI issues and 'why does firefox not play youtube videos?' and crap like that) and whether we make our CLI images 'true headless' (disabling GPU/HDMI like we did with H3 boards since both SoCs are pretty identical in this regard and I would assume that this helps with specific server workloads a lot due to more memory bandwidth, more useable RAM and better throttling behaviour) 2 Quote
eli Posted August 29, 2016 Posted August 29, 2016 I really like your POV I think it would be very interesting to bechmark MPI on pine 64 cluster, after all its the only GbE device on the market that cost 19$ 0 Quote
tkaiser Posted August 29, 2016 Author Posted August 29, 2016 I think it would be very interesting to bechmark MPI on pine 64 cluster, after all its the only GbE device on the market that cost 19$ Well, MPI is an implementation (needing GbE network tuning in another direction compared to the usual moronic iperf tests: MPI needs low latency!) and not a specific benchmark, but you're right. This is one of the few use cases where Pine64+ could shine when used with optimized software. Compare the Linpack results above or those from cluster setups where unoptimized software has been used. A yet not published 'Cluster Deathmatch: Raspberry Pi3 vs NanoPC-T3 vs Orange Pi Plus2E vs PINE A64+' article for example shows really weird numbers if you compare what happens when you use the correct compiler switches (and HPC is only about this since you get results magnitudes better with same consumption figures, and especially performance per watt ratio increases a lot) Debian Jessie hpcc NEON/hpcc Pine64+ 7731 MFLOPS 15000 MFLOPS Plus 2E - MFLOPS 8650 MFLOPS NanoPC 27740 MFLOPS 62500 MFLOPS RPi 3 3402 MFLOPS 18000 MFLOPS So if we take the 15 GFLOPS of a 5 node Pine64+ cluster into account it gets interesting. But on the other hand the result listed there for NanoPC-T3 already indicates that NanoPi M3 using the very same SoC might be the better choice. It's GbE capable, has twice as much Cortex-A53 cores but suffers from missing ARMv8 optimized distro/kernel just like RPi 3. But using the correct compiler switches and ARMv8/NEON optimizations maybe the 27.6 GFLOPS above from Jessie's hpcc (ARMv7!) binary will be twice as much in reality (M3 still costs less than $40 if you add the necessary heatsink). Update: I tested on the M3 as can be seen here. With my setup only using a laughable 5V/2A PSU connected to the power pins and a rather inefficient fan I'm not able to fully unlock the performance potential of this board. When using a more beefy PSU and better cooling a SBC cluster made out of 5 NanoPi M3 should get a total 62 GFLOPS score using an (ARMv7) optimized Linpack version (confirmed in the meantime). Therefore adding all costs and comparing 15 GFLOPS for a 5 node Pine64+ cluster with the 62 GFLOPS NanoPi M3 is able to achieve it's easy to decide (against Pine64+ ) Anyway Pine64+ using non-standard USB-type-A-to-type-A cables can be booted through FEL or from cheap SPI flash modules so building a cluster out of a few Pine64+ might not even require adding SD-cards to the setup which will further reduce costs. And if one knows how to test individual clockspeed reliability it can be overclocked up to 1296 MHz without issues (but requires large heatsinks and a lot of airflow!). Armbian since it's not a distro but a convenient build system would be a perfect basis for such experiments since a few tweaks are enough to produce custom OS images that contain all the necessary cluster stuff already. See customization and NFS boot options. 1 Quote
eli Posted August 29, 2016 Posted August 29, 2016 Wow, that's awesome comparison to bad they used Debian Jessie 8.5, kernel3.10.102-2-pine64-longsleep (base image, supplied by Pine64.Pro) used for the PineA64+s. and not Ubuntu. BTW is there ETA for Armbian on NanoPi M3? 0 Quote
tkaiser Posted August 29, 2016 Author Posted August 29, 2016 Wow, that's awesome comparison Well, I don't agree since running standard distro packages on a HPC cluster is nothing you want to do. I just tested with an optimized Linpack on NanoPi M3 and based on that with a better PSU and better heat dissipation the 27740 MFLOPS for NanoPC-T3 would be 60000 MFLOPS in reality (please note that you get this increase in performance for free just by optimizing software in this case dropping distro packages and using the version that will be used when it's about HPC -- the 'HP' in 'HPC' is there for a reason, it's all about high performance computing and not producing meaningless numbers) Armbian on NanoPi M3? No idea whether that will ever happen. FriendlyARM boards receive excellent software support by their vendor (especially if you compare to other vendors that sell SBC containing 'Pi' in their name!) so these are the last boards where an Armbian port would be 'necessary'. Also FA has a nice ecosystem around their SBCs (displays that are recognized already in u-boot and fully supported, the Matrix add-ons) so that we would really have a lot to do to provide something similar. But by looking through FA's github ressources it shouldn't be that hard to provide at least true headless distros for M3 and M2 (which should work flawlessly on NanoPC T3/T2 too). BTW: Since this gets off-topic here let's continue discussion in the NanoPi M3 thread if needed. 0 Quote
HarfTarf Posted August 31, 2016 Posted August 31, 2016 Thanks for all your work on this, I'm going to try out the images tomorrow! 0 Quote
tkaiser Posted August 31, 2016 Author Posted August 31, 2016 Nice update on the A64 laptop from scratch: https://olimex.wordpress.com/2016/08/31/teres-i-diy-open-source-hardware-hackers-laptop-update/ Combined with Armbian's A64 release it might be the first headless laptop in the world Just kidding, Olimex will be using one of A64's LCD interfaces with a converter chip for eDP (embedded Display Port) to drive the laptop's LCD so the 'exotic' resolution of 1366x768 won't be a problem (unlike with HDMI, where 1366x768 are currently not possible and most probably also not anytime soon or ever) 0 Quote
mark.dunn Posted August 31, 2016 Posted August 31, 2016 Are you sure it's running from mainline, uname -r gives 4.7.0 It looks like it is from apritzel's git 0 Quote
zador.blood.stained Posted August 31, 2016 Posted August 31, 2016 Yes, this is @apritzel's branch, but it's essentially mainline kernel with some extra patches on top. 0 Quote
tkaiser Posted September 1, 2016 Author Posted September 1, 2016 Now the fun with A64 begins: Olimex posted an update on A64-OLinuXino-eMMC: https://olimex.wordpress.com/2016/09/01/a64-olinuxino-emmc-rev-b-oshw-64-bit-arm-development-board-prototypes-are-testing/ We get bootable SPI flash on the lower PCB side (and now I really believe we get the same with Orange Pi PC 2 and 3), eMMC with fast modes and voltage switching, most probably a cost-down variant without eMMC (otherwise adding SPI flash would not make that much sense), less power hungry DDR3L, an Gigabit Ethernet PHY available in industrial temperature range and a few more tweaks From a software point of view SPI flash booting and eMMC with voltage switching needs some work and maybe tweaks for the different GbE PHY. But apart from that we should already be there... 1 Quote
tkaiser Posted September 2, 2016 Author Posted September 2, 2016 Just a small reminder: End users should not use vanilla images on any A64 board now since expectations won't match reality (too much stuff still not working). And you should also keep in mind that support for Armbian releases only happens here in the 'other boards' forum and not in Pine64 forum. For whatever reasons the Pine64 folks enabled a member of team 'Dunning-Kruger' to act there as a moderator who not only actively prevents resolving the long known GbE issue with some Pine64+ boards but who also constantly mis-uses his moderator role. I added the remark 'Armbian support in Armbian forum only (possible). There no moderators are constantly deleting/editing others posts' to my last post over there: http://forum.pine64.org/showthread.php?tid=2078&pid=19019#pid19019 Again, the most important part of the post deleted by this person and Pine64 forum account N° 32 banned by N° 1323. It seems every product gets the 'supporters' it deserves 2 Quote
zador.blood.stained Posted September 2, 2016 Posted September 2, 2016 We get bootable SPI flash on the lower PCB side (and now I really believe we get the same with Orange Pi PC 2 and 3), eMMC with fast modes and voltage switching, most probably a cost-down variant without eMMC (otherwise adding SPI flash would not make that much sense), less power hungry DDR3L, an Gigabit Ethernet PHY available in industrial temperature range and a few more tweaks From a software point of view SPI flash booting and eMMC with voltage switching needs some work and maybe tweaks for the different GbE PHY. But apart from that we should already be there... Yes, we get a pad for soldering SPI flash if I understood it correctly. As for Orange Pi PC 2 and 3 - I still think that it's SPI flash because people may have asked about it, but just a note - not every SO-8 chip on SBC is SPI flash, for example one on bottom of Lime2-eMMC is 2KB EEPROM for MAC address storage and more 0 Quote
pfeerick Posted September 2, 2016 Posted September 2, 2016 I added the remark 'Armbian support in Armbian forum only (possible). There no moderators are constantly deleting/editing others post' to my last post over there: http://forum.pine64.org/showthread.php?tid=2078&pid=19019#pid19019 Again, the most important part of the post deleted by this person and Pine64 forum account N° 32 banned by N° 1323. It seems every product gets the 'supporters' it deserves And it looks like it got edited twice... as I first saw it just say it was edited, and now it says "edit: moderator: inflammatory commentary was removed from the post, the technical content was left unaltered"... technical content like where to actually find support was left unaltered? 0 Quote
tkaiser Posted September 2, 2016 Author Posted September 2, 2016 Yes, we get a pad for soldering SPI flash if I understood it correctly. As for Orange Pi PC 2 and 3 - I still think that it's SPI flash because people may have asked about it, but just a note - not every SO-8 chip on SBC is SPI flash, for example one on bottom of Lime2-eMMC is 2KB EEPROM for MAC address storage and more Agreed, let's wait and see. Also whether Olimex will sell their board with SPI flash populated but without eMMC (IIRC the eMMC on the Lime2 is pretty slow so I'm curious whether voltage switching will change anything significant). And it looks like it got edited twice... as I first saw it just say it was edited, and now it says "edit: moderator: inflammatory commentary was removed from the post, the technical content was left unaltered"... technical content like where to actually find support was left unaltered? Well, it happened many times that this person edited the relevant parts away or deleted whole posts. And unfortunately the moderators there are part of... if not the problem itself (Lenny providing OS images with wrong cpufreq governor that lead to low/random Ethernet performance results which prevents any progress isolating problems -- as soon as his OS images would switch to interactive most GbE performance complaints would already be gone -- and this other guy is causing unbelievable damage. How long did it took you to convince him of the obvious? That valid use cases for GbE exist and that Pine64 can make use of GbE? Unbelievable) Anyway: let's leave the problems behind and focus on real work and fun. 1 Quote
@lex Posted September 2, 2016 Posted September 2, 2016 Anyway: let's leave the problems behind and focus on real work and fun. Completely agree. I have some questions: * I am looking for the regulator afvdd and afvdd_vol on the AXP81x, i can only find these: /sys/kernel/debug/regulator/axp81x_eldo3/dvdd-csi-18 /sys/kernel/debug/regulator/axp81x_dldo3/avdd-csi /sys/kernel/debug/regulator/axp81x_aldo1/vdd-csi-led /sys/kernel/debug/regulator/axp81x_aldo1/iovdd-csi Can some one comment on these? * what would be the changes on DTS, about Dram Timings, GbE in use, etc.. on the other A64 (aka M64)? Can you point to where to look for? 0 Quote
tkaiser Posted September 2, 2016 Author Posted September 2, 2016 Can you point to where to look for? To be honest: Better ask in linux-sunxi IRC. Andre (apritzel) got a BPi-M64 and the board seems to be almost identical to Pine64+/2GB. Apritzel's timezone is UTC. Apart from that I can not test anything currently since I sent my remaining Pain64 out to another user here who suffers from Pain64 GbE disease. Maybe I get his board next week (don't mind if not) to test through the GbE issue. 0 Quote
pfeerick Posted September 3, 2016 Posted September 3, 2016 Anyway: let's leave the problems behind and focus on real work and fun. I concur... let the Pain64 bashing... er... smashing... um... using? begin!! In the GbE note, what are some more realistic performance numbers you would expect to see from it. On the pine64.pro debian, with updated longlseep kernel, after running longsleeps optimisation script, I have been able to get 29-30MB/s samba upload to the Pine64, but only 10MB/s down... and someone commented that they had the exact opposite results, when using a USB GbE adapter... so do you think I'll be able to lift the down throughput to nearer the up, or is that getting to about the best we're likely to see from it... Before you lay into me on not this not being the support forum for the Pain64 distro... I will be trying this again onArmbian, to see if there is something still wrong in the Pain64 distro build... or if this is all the Pine64 is willing to give me as things currently stand... 0 Quote
tkaiser Posted September 3, 2016 Author Posted September 3, 2016 The Pine64+ I had here (sent to @androsch a few days ago) with Armbian / Xenial (Xenial matters, I explain above why using Xenial vs. Jessie makes a difference if anyone relies on iperf/iperf3 numbers) has not the slighest problem to saturate its GbE interface. We also know that the USB 2.0 implementation of A64 is limited to ~35MB/s with BSP kernel (if you measure higher numbers you measured fs buffers). We also know that Samba without extensive tuning on SBCs doesn't perform very well. We also know that copying a bunch of small files takes longer than 1 large file over Samba since both Explorer and Samba do a great job in slowing things down. That being said with good Samba settings you could expect ~30MB/s in both directions with large files. If @androsch's Pine64 arrives here and I find the time I'll set up a RAID-0 with a bus-powered 2.5" disk (setting the DC5V/BAT jumper to power the disk from Euler pins) and a 3.5" disk and test throughput (my dev sample does not have this jumper and due to the Pine64 design flaws reliable operation with a bus powered disk is not possible). Apart from that: This here is Armbian, Pine64 is just one of over +40 boards we support, there exist a lot of better choices for NAS use cases: http://linux-sunxi.org/Sunxi_devices_as_NAS 1 Quote
pfeerick Posted September 3, 2016 Posted September 3, 2016 Perfect timing! I was about to post that I have finally done a test with Armbian, and with a fresh legacy install (after realising some fool had gone and installed vanilla before) + installing samba, my 'scientific' test of copying a 1GB file from a linux desktop with GbE to a USB 3 (I know only USB2 for the pine, but higher speed memory has to help... right?) flash drive in the LOWER usb port... and out of the box... I got 29MB/s up (to pine64) and 24-25MB/s down (from pine64)... so pretty happy with those results... notwithstanding the fact that it should be possible to elk a smidgen more more out of it! I currently use a cubietruck with Cubian, acting partially as a file share, mostly as video streaming/storage, and one idea when getting the pine64 was that it would do as a backup for if (when) the Cubietruck finally croaks... especially for the price. I'm getting close to the speed of the cubietruck, as the bulk storage is on 4TB usb drives, so limited by the USB bottleneck also. Thanks for that link though... will definitely go back to that when shopping for my next NAS board... looks like the BananaPi Pro would be a better choice 0 Quote
pfeerick Posted September 3, 2016 Posted September 3, 2016 (edited) Hm... thought I posted this before, but must have closed the tab without actually sending.... Anyway, I don't know enough about the Armbian image build process to work out where to shove a PR/issue... so will document the issue here, and pin my ears back Edit: nevermind, found the source of the gremlin, and raised an issue ... feel free to delete this post as it's now sort of redundant. When running legacy Armbian, I ran armbianmonitor in monitor mode (-m), and noticed the temperature was staying abnormally low at 25C. Went digging in the source, and realised that /etc/armbianmonitor/datasources/soctemp links to the relevant sysfs entry, and in the case of the pine64 was pointing to /sys/devices/virtual/thermal/thermal_zone1/temp instead of /sys/devices/virtual/thermal/thermal_zone0/temp (thermal_zone1 instead of thermal_zone0). Removed and recreated the link, and now getting more sensible readings of 44C idling, and changing with load... I dug a little further, and it looks like /etc/update-motd.d/30-sysinfo needs to be updated so it creates the right link, but I don't get why line #127 doesn't trigger properly to prevent choosing the wrong thermal_zone Edited September 3, 2016 by pfeerick 0 Quote
tkaiser Posted September 3, 2016 Author Posted September 3, 2016 I got 29MB/s up (to pine64) and 24-25MB/s down (from pine64)... so pretty happy with those results... notwithstanding the fact that it should be possible to elk a smidgen more more out of it! Please try out longsleep's tweaks (except of switching to performance governor) and report back. And then please try out Timo's (silentcreek) smb.conf modifications (first without longsleep's tweaks, then combining both). Regarding a replacement of the Cubietruck you should keep in mind that CT has 2 GB DRAM which should help with NAS write speeds and file sizes of up 1 GB (if configured correctly so that the filesharing daemon in question -- samba in your case -- uses FS buffer and does not try immediately to disk). And I personally would wait in any case what performance we'll achieve when the first R40 boards are available (R40 is a quad-core A20 successor by Allwinner) 0 Quote
zador.blood.stained Posted September 3, 2016 Posted September 3, 2016 Please try out longsleep's tweaks (except of switching to performance governor) and report back. And then please try out Timo's (silentcreek) smb.conf modifications (first without longsleep's tweaks, then combining both). Regarding a replacement of the Cubietruck you should keep in mind that CT has 2 GB DRAM which should help with NAS write speeds and file sizes of up 1 GB (if configured correctly so that the filesharing daemon in question -- samba in your case -- uses FS buffer and does not try immediately to disk). And I personally would wait in any case what performance we'll achieve when the first R40 boards are available (R40 is a quad-core A20 successor by Allwinner) Since I'm using samba on CT and I tried some optimizations to improve read and write speed, the biggest improvement for me was achieved by setting use sendfile = yes I used profiler on smbd after that, and it reported that sendfile() call - copying data between userspace and kernel - was the main bottleneck. If I understand this correctly, it is almost impossible to significantly optimize it further without increasing CPU and DRAM frequencies, and that's also why I don't think that R40 will show significantly better performance with samba. I always do tests by reading and writing large enough files to tmpfs or SATA/USB disks, in case of transferring small files other options will show different results. 1 Quote
tkaiser Posted September 5, 2016 Author Posted September 5, 2016 Since I'm using samba on CT and I tried some optimizations to improve read and write speed, the biggest improvement for me was achieved by setting use sendfile = yes Ok, this is something that should be tried out. And at least with Pine64 BSP kernel cpufreq governor might still be an issue: https://www.youtube.com/watch?v=33YicbIvsH8#t=1m11s (in download direction CPU remains at 480 MHz nearly all the time but if one looks closer at CPU utilization and also behaviour of average load it seems the USB storage is the real bottleneck in this direction -- I would assume running an 'iostat 3' in parallel will show pretty high %iowait values). Anyway: if any further testing is done, switching to performance should be tested separately (with Armbian this is editing /etc/defaults/cpufrequtils and restarting cpufrequtils daemon or the board) @pfeerick: it seems you were testing on an 64GB USB thumb drive? If that's the case please be aware that some of these devices also show weird behaviour (thermal throttling after a certain amount of data has been written). Given the drive is mounted as /mnt/usb could you please provide the output from cd /mnt/usb && iozone -a -g 6000m -s 6000m -i 0 -i 1 -r 4K -r 1024K Another approach would be to run 'armbianmonitor -c /mnt/usb' but this starts also a reliability test over the whole free capacity and tests random IO which might takes ages on an average flash based media (average means slow). BTW: Another great test tool and some notes how real-world file copy tasks might differ from benchmarks depending on this and that here: http://www.helios.de/web/EN/support/TI/157.html 0 Quote
pfeerick Posted September 5, 2016 Posted September 5, 2016 It'll be a day or two before I can do any more tests, but the results of the iozone test across the 64GB flash drive are as follow (after a reformat to exfat as it didn't like the idea of writing 6GB files to fat32... so I'll have to see if that has any impact on anything also now ). This is without any active (fan) cooling... just the heatsinks, and an ambient temperature of 24C. btw, I'm happy to do speed tests pushing the board to the limits, but the goal of videos atm is simply to discredit any claims that the best the board can do is 100Mbit, as made by some people, and also show that you can get those higher speeds with even the most dodgy of setups... ie. I still get those speeds with only the microUSB power, and on a longer questionable quality cable... so whilst power is most likely a issue and I'd always recommend bypassing the microUSB, it isn't always the case. Just wanting to indicate that it can be done without having to do anything special... ;-) Run began: Mon Sep 5 14:25:20 2016 Auto Mode Using maximum file size of 6144000 kilobytes. File size set to 6144000 kB Record Size 4 kB Record Size 1024 kB Command line used: iozone -a -g 6000m -s 6000m -i 0 -i 1 -r 4K -r 1024K Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 6144000 4 32990 32838 33887 33715 6144000 1024 32915 32798 34592 34506 iozone test complete. Armbianmonitor log for the same time period Time CPU load %cpu %sys %usr %nice %io %irq CPU 14:24:52: 1152MHz 0.15 11% 1% 0% 0% 10% 0% 42°C 14:24:57: 480MHz 0.14 11% 1% 0% 0% 10% 0% 42°C 14:25:02: 480MHz 0.13 11% 1% 0% 0% 10% 0% 42°C 14:25:07: 480MHz 0.12 11% 1% 0% 0% 10% 0% 42°C 14:25:12: 480MHz 0.11 11% 1% 0% 0% 10% 0% 42°C 14:25:17: 480MHz 0.10 11% 1% 0% 0% 10% 0% 42°C 14:25:22: 480MHz 0.09 11% 1% 0% 0% 10% 0% 44°C 14:25:28: 1152MHz 0.32 11% 1% 0% 0% 10% 0% 49°C 14:25:33: 1152MHz 0.62 11% 1% 0% 0% 10% 0% 50°C 14:25:38: 1152MHz 0.81 11% 1% 0% 0% 10% 0% 50°C 14:25:43: 1152MHz 0.98 11% 1% 0% 0% 10% 0% 51°C 14:25:48: 1152MHz 1.23 11% 1% 0% 0% 10% 0% 50°C 14:25:53: 1152MHz 1.37 11% 1% 0% 0% 10% 0% 51°C 14:25:58: 1152MHz 1.50 11% 1% 0% 0% 10% 0% 52°C 14:26:03: 1152MHz 1.70 11% 1% 0% 0% 10% 0% 52°C 14:26:08: 1152MHz 1.80 11% 1% 0% 0% 10% 0% 51°C 14:26:13: 1152MHz 2.14 11% 1% 0% 0% 10% 0% 52°C 14:26:18: 1152MHz 2.21 11% 1% 0% 0% 10% 0% 52°C 14:26:23: 1152MHz 2.27 11% 1% 0% 0% 10% 0% 52°C 14:26:28: 1152MHz 2.33 11% 1% 0% 0% 10% 0% 51°C 14:26:33: 1152MHz 2.38 11% 1% 0% 0% 10% 0% 52°C 14:26:38: 1152MHz 2.43 11% 1% 0% 0% 10% 0% 52°C 14:26:43: 1152MHz 2.48 11% 1% 0% 0% 10% 0% 52°C 14:26:48: 1152MHz 2.52 11% 1% 0% 0% 10% 0% 53°C 14:26:53: 1152MHz 2.56 11% 1% 0% 0% 10% 0% 53°C 14:26:59: 1152MHz 2.67 11% 1% 0% 0% 10% 0% 54°C 14:27:04: 1152MHz 2.70 11% 1% 0% 0% 10% 0% 53°C 14:27:09: 1152MHz 2.72 11% 1% 0% 0% 10% 0% 54°C 14:27:14: 1152MHz 2.67 11% 1% 0% 0% 10% 0% 54°C 14:27:19: 1152MHz 2.69 11% 1% 0% 0% 10% 0% 54°C 14:27:24: 1152MHz 2.72 11% 1% 0% 0% 10% 0% 54°C 14:27:29: 1152MHz 2.74 11% 1% 0% 0% 10% 0% 55°C 14:27:34: 1152MHz 2.76 11% 1% 0% 0% 10% 0% 55°C 14:27:39: 1152MHz 2.78 12% 1% 0% 0% 10% 0% 54°C 14:27:44: 1152MHz 2.88 12% 1% 0% 0% 10% 0% 55°C 14:27:49: 1152MHz 2.89 12% 1% 0% 0% 10% 0% 54°C 14:27:54: 1152MHz 2.90 12% 1% 0% 0% 10% 0% 54°C 14:27:59: 1152MHz 2.90 12% 1% 0% 0% 10% 0% 55°C 14:28:04: 1152MHz 2.91 12% 1% 0% 0% 10% 0% 55°C 14:28:09: 1152MHz 2.92 12% 1% 0% 0% 10% 0% 56°C 14:28:14: 1152MHz 2.93 12% 1% 0% 0% 10% 0% 55°C 14:28:19: 1152MHz 2.93 12% 1% 0% 0% 10% 0% 55°C 14:28:24: 1152MHz 2.94 12% 1% 0% 0% 10% 0% 55°C 14:28:30: 1152MHz 2.86 12% 1% 0% 0% 10% 0% 52°C 14:28:35: 1152MHz 2.87 12% 1% 0% 0% 10% 0% 57°C 14:28:40: 1152MHz 2.88 12% 1% 0% 0% 10% 0% 56°C 14:28:45: 1152MHz 2.89 12% 1% 0% 0% 10% 0% 56°C 14:28:50: 1152MHz 2.90 12% 1% 0% 0% 10% 0% 55°C 14:28:55: 1152MHz 2.83 12% 1% 0% 0% 10% 0% 56°C 14:29:00: 1152MHz 2.84 12% 1% 0% 0% 10% 0% 55°C 14:29:05: 1152MHz 2.78 12% 1% 0% 0% 10% 0% 55°C 14:29:10: 1152MHz 2.79 12% 1% 0% 0% 10% 0% 55°C 14:29:15: 1152MHz 2.89 12% 1% 0% 0% 10% 0% 56°C 14:29:20: 1152MHz 2.90 12% 1% 0% 0% 10% 0% 55°C 14:29:25: 1152MHz 2.91 12% 1% 0% 0% 10% 0% 55°C 14:29:30: 1152MHz 2.91 12% 1% 0% 0% 10% 0% 56°C 14:29:35: 1152MHz 2.92 12% 1% 0% 0% 10% 0% 56°C 14:29:40: 1152MHz 2.93 12% 1% 0% 0% 10% 0% 55°C 14:29:45: 1152MHz 2.85 12% 1% 0% 0% 10% 0% 56°C 14:29:50: 1152MHz 2.87 12% 1% 0% 0% 11% 0% 56°C 14:29:55: 1152MHz 2.88 12% 1% 0% 0% 11% 0% 57°C 14:30:01: 1152MHz 2.81 12% 1% 0% 0% 11% 0% 56°C 14:30:06: 1152MHz 2.82 12% 1% 0% 0% 11% 0% 57°C 14:30:11: 1152MHz 2.92 12% 1% 0% 0% 11% 0% 56°C 14:30:16: 1152MHz 2.92 12% 1% 0% 0% 11% 0% 56°C 14:30:21: 1152MHz 2.93 12% 1% 0% 0% 11% 0% 56°C 14:30:26: 1152MHz 2.93 12% 1% 0% 0% 11% 0% 56°C 14:30:31: 1152MHz 2.94 12% 1% 0% 0% 11% 0% 55°C 14:30:36: 1152MHz 2.94 12% 1% 0% 0% 11% 0% 56°C 14:30:41: 1152MHz 2.95 12% 1% 0% 0% 11% 0% 56°C 14:30:46: 1152MHz 2.95 12% 1% 0% 0% 11% 0% 56°C 14:30:51: 1152MHz 2.96 12% 1% 0% 0% 11% 0% 57°C 14:30:56: 1152MHz 3.04 12% 1% 0% 0% 11% 0% 57°C 14:31:01: 1152MHz 3.04 12% 1% 0% 0% 11% 0% 56°C 14:31:06: 1152MHz 3.03 12% 1% 0% 0% 11% 0% 57°C 14:31:11: 1152MHz 3.03 12% 1% 0% 0% 11% 0% 56°C 14:31:16: 1152MHz 3.03 12% 1% 0% 0% 11% 0% 56°C 14:31:21: 1152MHz 3.03 12% 1% 0% 0% 11% 0% 56°C 14:31:27: 1152MHz 3.02 12% 1% 0% 0% 11% 0% 56°C 14:31:32: 1152MHz 3.02 12% 1% 0% 0% 11% 0% 56°C 14:31:37: 1152MHz 3.02 12% 1% 0% 0% 11% 0% 57°C 14:31:42: 1152MHz 3.10 12% 1% 0% 0% 11% 0% 54°C 14:31:47: 1152MHz 3.01 12% 1% 0% 0% 11% 0% 54°C 14:31:52: 1152MHz 2.93 12% 1% 0% 0% 11% 0% 54°C 14:31:57: 1152MHz 2.94 12% 1% 0% 0% 11% 0% 53°C 14:32:02: 1152MHz 2.94 12% 1% 0% 0% 11% 0% 53°C 14:32:07: 1152MHz 2.95 12% 1% 0% 0% 11% 0% 53°C 14:32:12: 1152MHz 2.95 13% 1% 0% 0% 11% 0% 53°C 14:32:17: 1152MHz 2.95 13% 1% 0% 0% 11% 0% 53°C 14:32:22: 1152MHz 3.04 13% 1% 0% 0% 11% 0% 53°C 14:32:27: 1152MHz 3.12 13% 1% 0% 0% 11% 0% 53°C 14:32:32: 1152MHz 3.11 13% 1% 0% 0% 11% 0% 53°C 14:32:37: 1152MHz 3.18 13% 1% 0% 0% 11% 0% 53°C 14:32:42: 1152MHz 3.16 13% 1% 0% 0% 11% 0% 53°C 14:32:47: 1152MHz 3.15 13% 1% 0% 0% 11% 0% 53°C 14:32:52: 1152MHz 3.14 13% 1% 0% 0% 11% 0% 53°C 14:32:58: 1152MHz 3.05 13% 1% 0% 0% 11% 0% 53°C 14:33:03: 1152MHz 3.04 13% 1% 0% 0% 11% 0% 53°C 14:33:08: 1152MHz 3.04 13% 1% 0% 0% 11% 0% 52°C 14:33:13: 1152MHz 3.04 13% 1% 0% 0% 11% 0% 53°C 14:33:18: 1152MHz 3.03 13% 1% 0% 0% 11% 0% 53°C 14:33:23: 1152MHz 2.95 13% 1% 0% 0% 11% 0% 53°C 14:33:28: 1152MHz 2.95 13% 1% 0% 0% 11% 0% 53°C 14:33:33: 1152MHz 2.88 13% 1% 0% 0% 11% 0% 53°C 14:33:38: 1152MHz 2.89 13% 1% 0% 0% 11% 0% 52°C 14:33:43: 1152MHz 2.90 13% 1% 0% 0% 11% 0% 52°C 14:33:48: 1152MHz 2.90 13% 1% 0% 0% 11% 0% 52°C 14:33:53: 1152MHz 2.91 13% 1% 0% 0% 11% 0% 53°C 14:33:58: 1152MHz 3.00 13% 1% 0% 0% 11% 0% 52°C 14:34:03: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:08: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:13: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:18: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:24: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:29: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:34: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:39: 1152MHz 3.00 13% 1% 0% 0% 12% 0% 52°C 14:34:44: 1152MHz 3.08 13% 1% 0% 0% 12% 0% 52°C 14:34:49: 1152MHz 3.07 13% 1% 0% 0% 12% 0% 52°C 14:34:54: 1152MHz 3.15 13% 1% 0% 0% 12% 0% 52°C 14:34:59: 1152MHz 3.14 13% 1% 0% 0% 12% 0% 52°C 14:35:04: 1152MHz 3.12 13% 1% 0% 0% 12% 0% 52°C 14:35:09: 1152MHz 3.20 13% 1% 0% 0% 12% 0% 52°C 14:35:14: 1152MHz 3.17 13% 1% 0% 0% 12% 0% 52°C 14:35:19: 1152MHz 3.15 13% 1% 0% 0% 12% 0% 52°C 14:35:24: 1152MHz 3.22 14% 1% 0% 0% 12% 0% 52°C 14:35:29: 1152MHz 3.20 14% 1% 0% 0% 12% 0% 52°C 14:35:34: 1152MHz 3.27 14% 1% 0% 0% 12% 0% 52°C 14:35:39: 1152MHz 3.25 14% 1% 0% 0% 12% 0% 52°C 14:35:44: 1152MHz 3.23 14% 1% 0% 0% 12% 0% 52°C 14:35:49: 1152MHz 3.21 14% 1% 0% 0% 12% 0% 52°C 14:35:55: 1152MHz 3.35 14% 1% 0% 0% 12% 0% 51°C 14:36:00: 1152MHz 3.32 14% 1% 0% 0% 12% 0% 52°C 14:36:05: 1152MHz 3.30 14% 1% 0% 0% 12% 0% 52°C 14:36:10: 1152MHz 3.19 14% 1% 0% 0% 12% 0% 52°C 14:36:15: 1152MHz 3.18 14% 1% 0% 0% 12% 0% 52°C 14:36:20: 1152MHz 3.16 14% 1% 0% 0% 12% 0% 52°C 14:36:25: 1152MHz 3.15 14% 1% 0% 0% 12% 0% 52°C 14:36:30: 1152MHz 3.14 14% 1% 0% 0% 12% 0% 52°C 14:36:35: 1152MHz 3.13 14% 1% 0% 0% 12% 0% 52°C 14:36:40: 1152MHz 3.20 14% 1% 0% 0% 12% 0% 52°C 14:36:45: 1152MHz 3.18 14% 1% 0% 0% 12% 0% 52°C 14:36:50: 1152MHz 3.17 14% 1% 0% 0% 12% 0% 52°C 14:36:55: 1152MHz 3.15 14% 1% 0% 0% 12% 0% 52°C 14:37:00: 1152MHz 3.14 14% 1% 0% 0% 12% 0% 52°C 14:37:05: 1152MHz 3.21 14% 1% 0% 0% 12% 0% 52°C 14:37:10: 1152MHz 3.19 14% 1% 0% 0% 12% 0% 52°C 14:37:15: 1152MHz 3.18 14% 1% 0% 0% 12% 0% 52°C 14:37:21: 1152MHz 3.16 14% 1% 0% 0% 12% 0% 52°C 14:37:26: 1152MHz 3.15 14% 1% 0% 0% 12% 0% 51°C 14:37:31: 1152MHz 3.22 14% 1% 0% 0% 12% 0% 52°C 14:37:36: 1152MHz 3.12 14% 1% 0% 0% 12% 0% 52°C 14:37:41: 1152MHz 3.11 14% 1% 0% 0% 12% 0% 52°C 14:37:46: 1152MHz 3.10 14% 1% 0% 0% 12% 0% 52°C 14:37:51: 1152MHz 3.17 14% 1% 0% 0% 12% 0% 52°C 14:37:56: 1152MHz 3.16 14% 1% 0% 0% 12% 0% 53°C 14:38:01: 1152MHz 3.15 14% 1% 0% 0% 12% 0% 53°C 14:38:06: 1152MHz 3.14 14% 1% 0% 0% 13% 0% 52°C 14:38:11: 1152MHz 3.12 14% 1% 0% 0% 13% 0% 52°C 14:38:16: 1152MHz 3.11 14% 1% 0% 0% 13% 0% 52°C 14:38:21: 1152MHz 3.11 14% 1% 0% 0% 13% 0% 53°C 14:38:26: 1152MHz 3.18 14% 1% 0% 0% 13% 0% 52°C 14:38:31: 1152MHz 3.16 14% 1% 0% 0% 13% 0% 53°C 14:38:36: 1152MHz 3.23 14% 1% 0% 0% 13% 0% 52°C 14:38:42: 1152MHz 3.21 14% 1% 0% 0% 13% 0% 52°C 14:38:47: 1152MHz 3.19 14% 1% 0% 0% 13% 0% 53°C 14:38:52: 1152MHz 3.18 14% 1% 0% 0% 13% 0% 53°C 14:38:57: 1152MHz 3.16 14% 1% 0% 0% 13% 0% 53°C 14:39:02: 1152MHz 3.15 15% 1% 0% 0% 13% 0% 53°C 14:39:07: 1152MHz 3.14 15% 1% 0% 0% 13% 0% 52°C 14:39:12: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 53°C 14:39:17: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 53°C 14:39:22: 1152MHz 3.05 15% 1% 0% 0% 13% 0% 53°C 14:39:27: 1152MHz 3.05 15% 1% 0% 0% 13% 0% 52°C 14:39:32: 1152MHz 2.96 15% 1% 0% 0% 13% 0% 52°C 14:39:37: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 53°C 14:39:42: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 52°C 14:39:47: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 53°C 14:39:52: 1152MHz 2.97 15% 1% 0% 0% 13% 0% 53°C 14:39:57: 1152MHz 2.90 15% 1% 0% 0% 13% 0% 53°C 14:40:03: 1152MHz 2.82 15% 1% 0% 0% 13% 0% 53°C 14:40:08: 1152MHz 2.92 15% 1% 0% 0% 13% 0% 53°C 14:40:13: 1152MHz 2.93 15% 1% 0% 0% 13% 0% 53°C 14:40:18: 1152MHz 2.93 15% 1% 0% 0% 13% 0% 53°C 14:40:23: 1152MHz 2.94 15% 1% 0% 0% 13% 0% 53°C 14:40:28: 1152MHz 2.94 15% 1% 0% 0% 13% 0% 53°C 14:40:33: 1152MHz 2.95 15% 1% 0% 0% 13% 0% 53°C 14:40:38: 1152MHz 2.95 15% 1% 0% 0% 13% 0% 53°C 14:40:43: 1152MHz 2.95 15% 1% 0% 0% 13% 0% 53°C 14:40:48: 1152MHz 2.96 15% 1% 0% 0% 13% 0% 53°C 14:40:53: 1152MHz 3.04 15% 1% 0% 0% 13% 0% 53°C 14:40:58: 1152MHz 3.04 15% 1% 0% 0% 13% 0% 51°C 14:41:03: 1152MHz 2.87 15% 1% 0% 0% 13% 0% 52°C 14:41:08: 1152MHz 2.89 15% 1% 0% 0% 13% 0% 53°C 14:41:13: 1152MHz 2.89 15% 1% 0% 0% 13% 0% 53°C 14:41:18: 1152MHz 2.90 15% 1% 0% 0% 13% 0% 53°C 14:41:23: 1152MHz 2.91 15% 1% 0% 0% 13% 0% 53°C 14:41:29: 1152MHz 2.92 15% 1% 0% 0% 13% 0% 53°C 14:41:34: 1152MHz 2.84 15% 1% 0% 0% 13% 0% 53°C 14:41:39: 1152MHz 2.86 15% 1% 0% 0% 13% 0% 53°C 14:41:44: 1152MHz 2.87 15% 1% 0% 0% 13% 0% 53°C 14:41:49: 1152MHz 2.88 15% 1% 0% 0% 13% 0% 53°C 14:41:54: 1152MHz 2.89 15% 1% 0% 0% 13% 0% 53°C 14:41:59: 1152MHz 2.90 15% 1% 0% 0% 13% 0% 53°C 14:42:04: 1152MHz 2.91 15% 1% 0% 0% 13% 0% 53°C 14:42:09: 1152MHz 2.91 15% 1% 0% 0% 13% 0% 53°C 14:42:14: 1152MHz 2.92 15% 1% 0% 0% 13% 0% 53°C 14:42:19: 1152MHz 2.93 15% 1% 0% 0% 13% 0% 53°C 14:42:24: 1152MHz 2.93 15% 1% 0% 0% 13% 0% 53°C 14:42:29: 1152MHz 2.86 15% 1% 0% 0% 13% 0% 53°C 14:42:34: 1152MHz 2.95 15% 1% 0% 0% 13% 0% 53°C 14:42:39: 1152MHz 2.95 15% 1% 0% 0% 14% 0% 53°C 14:42:44: 1152MHz 2.96 15% 1% 0% 0% 14% 0% 53°C 14:42:49: 1152MHz 2.96 15% 1% 0% 0% 14% 0% 53°C 14:42:55: 1152MHz 2.96 15% 1% 0% 0% 14% 0% 53°C 14:43:00: 1152MHz 2.97 15% 1% 0% 0% 14% 0% 53°C 14:43:05: 1152MHz 2.97 15% 1% 0% 0% 14% 0% 53°C 14:43:10: 1152MHz 2.97 15% 1% 0% 0% 14% 0% 53°C 14:43:15: 1152MHz 2.97 15% 1% 0% 0% 14% 0% 53°C 14:43:20: 1152MHz 3.06 15% 1% 0% 0% 14% 0% 52°C 14:43:25: 1152MHz 3.05 15% 1% 0% 0% 14% 0% 53°C 14:43:30: 1152MHz 3.05 15% 1% 0% 0% 14% 0% 53°C 14:43:35: 1152MHz 2.96 15% 1% 0% 0% 14% 0% 53°C 14:43:40: 1152MHz 3.05 15% 1% 0% 0% 14% 0% 53°C 14:43:45: 1152MHz 3.04 16% 1% 0% 0% 14% 0% 53°C 14:43:50: 1152MHz 3.04 16% 1% 0% 0% 14% 0% 53°C 14:43:55: 1152MHz 3.03 16% 1% 0% 0% 14% 0% 53°C 14:44:00: 1152MHz 3.11 16% 1% 0% 0% 14% 0% 53°C 14:44:05: 1152MHz 3.10 16% 1% 0% 0% 14% 0% 53°C 14:44:10: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:15: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:21: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:26: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:31: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:36: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:41: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:44:46: 1152MHz 3.17 16% 1% 0% 0% 14% 0% 52°C 14:44:51: 1152MHz 3.23 16% 1% 0% 0% 14% 0% 52°C 14:44:56: 1152MHz 3.22 16% 1% 0% 0% 14% 0% 52°C 14:45:01: 1152MHz 3.20 16% 1% 0% 0% 14% 0% 52°C 14:45:06: 1152MHz 3.18 16% 1% 0% 0% 14% 0% 52°C 14:45:11: 1152MHz 3.17 16% 1% 0% 0% 14% 0% 52°C 14:45:16: 1152MHz 3.15 16% 1% 0% 0% 14% 0% 52°C 14:45:21: 1152MHz 3.06 16% 1% 0% 0% 14% 0% 52°C 14:45:26: 1152MHz 3.06 16% 1% 0% 0% 14% 0% 52°C 14:45:31: 1152MHz 3.13 16% 1% 0% 0% 14% 0% 52°C 14:45:36: 1152MHz 3.12 16% 1% 0% 0% 14% 0% 52°C 14:45:42: 1152MHz 3.11 16% 1% 0% 0% 14% 0% 52°C 14:45:47: 1152MHz 3.10 16% 1% 0% 0% 14% 0% 52°C 14:45:52: 1152MHz 3.10 16% 1% 0% 0% 14% 0% 52°C 14:45:57: 1152MHz 3.01 16% 1% 0% 0% 14% 0% 52°C 14:46:02: 1152MHz 3.09 16% 1% 0% 0% 14% 0% 52°C 14:46:07: 1152MHz 3.08 16% 1% 0% 0% 14% 0% 52°C 14:46:12: 1152MHz 3.07 16% 1% 0% 0% 14% 0% 52°C 14:46:17: 1152MHz 3.15 16% 1% 0% 0% 14% 0% 52°C 14:46:22: 1152MHz 3.14 16% 1% 0% 0% 14% 0% 52°C 14:46:27: 1152MHz 3.13 16% 1% 0% 0% 14% 0% 52°C 14:46:32: 1152MHz 3.12 16% 1% 0% 0% 14% 0% 52°C 14:46:37: 1152MHz 3.11 16% 1% 0% 0% 14% 0% 52°C 14:46:42: 1152MHz 3.10 16% 1% 0% 0% 14% 0% 52°C 14:46:48: 1152MHz 3.09 16% 1% 0% 0% 14% 0% 52°C 14:46:53: 1152MHz 3.00 16% 1% 0% 0% 14% 0% 52°C 14:46:58: 1152MHz 2.92 16% 1% 0% 0% 14% 0% 52°C 14:47:03: 1152MHz 2.93 16% 1% 0% 0% 14% 0% 52°C 14:47:08: 1152MHz 2.93 16% 1% 0% 0% 14% 0% 52°C 14:47:13: 1152MHz 2.94 16% 1% 0% 0% 14% 0% 52°C 14:47:18: 1152MHz 2.94 16% 1% 0% 0% 14% 0% 52°C 14:47:23: 1152MHz 2.95 16% 1% 0% 0% 14% 0% 52°C 14:47:28: 1152MHz 2.95 16% 1% 0% 0% 14% 0% 52°C 14:47:33: 1152MHz 2.96 16% 1% 0% 0% 14% 0% 52°C 14:47:38: 1152MHz 2.96 16% 1% 0% 0% 14% 0% 52°C 14:47:43: 1152MHz 2.96 16% 1% 0% 0% 14% 0% 52°C 14:47:48: 1152MHz 2.89 16% 1% 0% 0% 14% 0% 52°C 14:47:54: 1152MHz 2.90 16% 1% 0% 0% 15% 0% 52°C 14:47:59: 1152MHz 2.98 16% 1% 0% 0% 15% 0% 52°C 14:48:04: 1152MHz 2.99 16% 1% 0% 0% 15% 0% 52°C 14:48:09: 1152MHz 2.99 17% 1% 0% 0% 15% 0% 52°C 14:48:14: 1152MHz 2.99 17% 1% 0% 0% 15% 0% 52°C 14:48:19: 1152MHz 2.91 17% 1% 0% 0% 15% 0% 52°C 14:48:24: 1152MHz 2.92 17% 1% 0% 0% 15% 0% 52°C 14:48:29: 1152MHz 2.84 17% 1% 0% 0% 15% 0% 52°C 14:48:34: 1152MHz 2.85 17% 1% 0% 0% 15% 0% 52°C 14:48:39: 1152MHz 2.87 17% 1% 0% 0% 15% 0% 52°C 14:48:44: 1152MHz 2.96 17% 1% 0% 0% 15% 0% 52°C 14:48:49: 1152MHz 2.96 17% 1% 0% 0% 15% 0% 52°C 14:48:54: 1152MHz 2.96 17% 1% 0% 0% 15% 0% 52°C 14:49:00: 1152MHz 2.97 17% 1% 0% 0% 15% 0% 52°C 14:49:05: 1152MHz 2.97 17% 1% 0% 0% 15% 0% 52°C 14:49:10: 1152MHz 2.97 17% 1% 0% 0% 15% 0% 52°C 14:49:15: 1152MHz 2.97 17% 1% 0% 0% 15% 0% 52°C 14:49:20: 1152MHz 2.98 17% 1% 0% 0% 15% 0% 52°C 14:49:25: 1152MHz 2.98 17% 1% 0% 0% 15% 0% 52°C 14:49:30: 1152MHz 3.06 17% 1% 0% 0% 15% 0% 52°C 14:49:35: 1152MHz 3.06 17% 1% 0% 0% 15% 0% 52°C 14:49:40: 1152MHz 3.13 17% 1% 0% 0% 15% 0% 52°C 14:49:45: 1152MHz 3.20 17% 1% 0% 0% 15% 0% 52°C 14:49:50: 1152MHz 3.18 17% 1% 0% 0% 15% 0% 52°C 14:49:55: 1152MHz 3.09 17% 1% 0% 0% 15% 0% 52°C 14:50:00: 1152MHz 3.08 17% 1% 0% 0% 15% 0% 52°C 14:50:06: 1152MHz 3.08 17% 1% 0% 0% 15% 0% 52°C 14:50:11: 480MHz 3.07 17% 1% 0% 0% 15% 0% 49°C 14:50:16: 480MHz 2.90 17% 1% 0% 0% 15% 0% 48°C 14:50:21: 480MHz 2.67 17% 1% 0% 0% 15% 0% 47°C 14:50:26: 480MHz 2.62 17% 1% 0% 0% 15% 0% 47°C 14:50:31: 480MHz 2.22 17% 1% 0% 0% 15% 0% 46°C 14:50:36: 480MHz 2.04 17% 1% 0% 0% 15% 0% 46°C 14:50:41: 480MHz 1.88 17% 1% 0% 0% 15% 0% 46°C 14:50:46: 480MHz 1.72 17% 1% 0% 0% 15% 0% 46°C 14:50:51: 480MHz 1.59 17% 1% 0% 0% 15% 0% 46°C 14:50:56: 480MHz 1.54 17% 1% 0% 0% 15% 0% 46°C 14:51:01: 480MHz 1.42 17% 1% 0% 0% 15% 0% 45°C 14:51:06: 480MHz 1.30 17% 1% 0% 0% 15% 0% 45°C 14:51:12: 480MHz 1.20 17% 1% 0% 0% 15% 0% 45°C 14:51:17: 480MHz 1.10 17% 1% 0% 0% 15% 0% 45°C 14:51:22: 480MHz 1.01 17% 1% 0% 0% 15% 0% 45°C 14:51:27: 480MHz 0.93 17% 1% 0% 0% 15% 0% 44°C 14:51:32: 480MHz 0.86 17% 1% 0% 0% 15% 0% 44°C 14:51:37: 480MHz 0.79 17% 1% 0% 0% 15% 0% 45°C 14:51:42: 480MHz 0.81 17% 1% 0% 0% 15% 0% 44°C 14:51:47: 480MHz 0.74 17% 1% 0% 0% 15% 0% 44°C 14:51:52: 480MHz 0.68 17% 1% 0% 0% 15% 0% 44°C 14:51:57: 480MHz 0.63 17% 1% 0% 0% 15% 0% 44°C 14:52:02: 480MHz 0.58 17% 1% 0% 0% 15% 0% 44°C 14:52:07: 480MHz 0.53 17% 1% 0% 0% 15% 0% 44°C 14:52:12: 480MHz 0.49 17% 1% 0% 0% 15% 0% 44°C 14:52:17: 480MHz 0.45 17% 1% 0% 0% 15% 0% 44°C 14:52:23: 480MHz 0.41 17% 1% 0% 0% 15% 0% 44°C 14:52:28: 480MHz 0.46 17% 1% 0% 0% 15% 0% 44°C 14:52:33: 480MHz 0.42 17% 1% 0% 0% 15% 0% 44°C 14:52:38: 480MHz 0.39 17% 1% 0% 0% 15% 0% 43°C 14:52:43: 480MHz 0.36 17% 1% 0% 0% 15% 0% 43°C 14:52:48: 480MHz 0.33 17% 1% 0% 0% 15% 0% 43°C 14:52:53: 480MHz 0.30 17% 1% 0% 0% 15% 0% 43°C 14:52:58: 480MHz 0.28 17% 1% 0% 0% 15% 0% 43°C 14:53:03: 480MHz 0.26 17% 1% 0% 0% 15% 0% 43°C 14:53:08: 480MHz 0.24 17% 1% 0% 0% 15% 0% 43°C 14:53:13: 480MHz 0.22 17% 1% 0% 0% 15% 0% 43°C 14:53:18: 480MHz 0.20 17% 1% 0% 0% 15% 0% 43°C 14:53:23: 480MHz 0.18 17% 1% 0% 0% 14% 0% 43°C 14:53:29: 480MHz 0.17 17% 1% 0% 0% 14% 0% 43°C 14:53:34: 480MHz 0.15 17% 1% 0% 0% 14% 0% 42°C 14:53:39: 480MHz 0.14 16% 1% 0% 0% 14% 0% 42°C 14:53:44: 480MHz 0.13 16% 1% 0% 0% 14% 0% 43°C 14:53:49: 480MHz 0.12 16% 1% 0% 0% 14% 0% 43°C 14:53:54: 480MHz 0.11 16% 1% 0% 0% 14% 0% 43°C 14:53:59: 480MHz 0.10 16% 1% 0% 0% 14% 0% 43°C 0 Quote
tkaiser Posted September 6, 2016 Author Posted September 6, 2016 I'm happy to do speed tests pushing the board to the limits, but the goal of videos atm is simply to discredit any claims that the best the board can do is 100Mbit, as made by some people, and also show that you can get those higher speeds with even the most dodgy of setups... Agreed, that's the right approach (when @androsch's board arrives and networking works in my environment I will provide some 'maximum tuning' numbers). But then you might go a step further and compare also the 'featured' Debian Jessie image from Pine64 wiki (using the wrong cpufreq governor that is mostly responsible for bad/random iperf numbers) without longsleep's network runing scripts? And since your goal is educating users maybe running an instance of 'iostat 5' in another session is interesting too (showing %iowait percentage) BTW: these 'some people' you're talking about is AFAIK only one individual causing real harm over there (received several PMs about him before my ban and in the meantime confirmation that a few regulars left only because of him). This thread here shows the problem: a power cut occured the user fears SD card corruption (an usual Raspberry Pi problem where it's also somewhat hardware related and the reason why longsleep's image creation scripts use journaling for the rootfs -- rootfs corruption on Pine64 is almost impossible) nothing happened of course (the warning talks about an USB drive connected one minute after boot instead: "65.856602] FAT-fs (sda2") this moderator starts to talk about the user's daughter and that it's necessary to start from scratch with a new OS image wiping out the SD card. While both installation and SD card are perfectly fine and the 'solution' would've be to stay calm and call fsck for the FAT partition on the correct device: fsck /dev/sda2. If anyone would've jumped into the thread and pointed out what harm this guy again caused he would've censored the posts and maybe ban this person. The weird stuff he constantly writes is one problem, that he constantly mis-uses his moderator role the bigger one. But as already written: it seems every product gets the 'supporters' it deserves. 0 Quote
pfeerick Posted September 6, 2016 Posted September 6, 2016 Thanks for that tkaiser, I'm glad you're on the same page there That is exactly what I will be doing... I have already uploaded several other videos showing the Pine64 debian distro, both without tuning (~9MB/s both ways) and tuned (~28MB/s up, ~9MB/s down IIRC), but I haven't made them public as I will be re-doing them on the weekend with a webcam on the pine64 board so they can't complain that it's not the Pine64 being tested, and can see how it is set up. Indeed... I don't know enough about the tests that I can do, so if you tell me, I'll run them Am I correct in thinking that iowait% is literally what it sounds like... % of time waiting for IO? I really need to get back into linux... haven't really dug down into it since 2006 when I was doing IT at uni! It's funny you would say that... I missed that thread at first, and now it's closed so I couldn't rely, but the first thing that jumped out was that the error was in relation to sda2... not mmcblk... so I failed to see how you could conclude there was anything wrong with the microSD (since it is /dev/mmcblk)... So I have, as usual, sent a PM to the user being misinformed, explained pretty much what you said... the microSD wasn't the subject the error, the journal file system should pretty much make it bulletproof, and just run fsck /dev/sda2 if you're worried about it! Ah well, not point crying over some elses spilt milk! :-/ 0 Quote
tkaiser Posted September 6, 2016 Author Posted September 6, 2016 Am I correct in thinking that iowait% is literally what it sounds like... % of time waiting for IO? It's 'the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request' and can be used as just another indicator where bottlenecks might exist. If you deal with 'network performance' on any slow ARM board now you need to understand where the bottlenecks are. CPU clockspeed matters, both storage and network performance depend on that therefore it's essential to have an eye on cpufreq governor used (since the wrong cpufreq governor let the CPU cores remain at low clockspeeds) some stuff is single threaded (eg. iperf/iperf3 or smbd processes) that means cpufreq governor gets even more important since the process gets bottlenecked by a single CPU core running at 480 MHz while the 3 remaining are more or less idle or do just some kernel stuff handling network and IO IRQs behaviour of some tasks depends on the distro used (see top posts here where I explained why iperf/iperf3 numbers with Xenial look better than with Debian) testing a combined workload that includes both network and IO the latter might become a bottleneck on its own (therefore storage has to be tested individually before and monitored later) So it's essential to monitor CPU clockspeed and affinity while testing (armbianmonitor and htop) but these won't tell the whole story since IO bottlenecks can also lead to low 'network performance' numbers. If htop shows an increased average load while CPU is not maxing out then this is already an indication since on Linux (unlike most if not all other Unix based OS) waiting for IO adds to average load. But using a dedicated tool like 'iostat 5' is better. IO bottlenecks can be caused by hardware (using a slow disk, SD card or thumb drive) and by settings (again cpufreq stuff, IO tunables, daemon config -- the smb.conf stuff we were already talking about -- and trivial stuff like the 'wrong' filesystem: NTFS when used on Linux can be such an example, I've no idea whether similar issues might exist when sharing FAT32 or exFAT since I would always use either ext4 or btrfs when kernel version allows it) Thanks for the numbers. So you got 9 MB/s SMB throughput in both directions when using Pine64's Debian Jessie and 34/23 MB/s when using Armbian's Jessie. Settings matter, ignorance hurts. BTW: this whole Pain64 story is somewhat remarkable. With nearly all other SBC the story starts almost always like this: a new board will be sold and the vendor provides crappy OS images. Then community jumps in, fixes the mistakes, improves settings, things evolve (this pretty much describes how and why Armbian has been born ) With Pain64 it was different. Pine64 folks provided no OS images at all except Android/RemixOS but were really helpful providing docs/info regarding board and from SoC vendor. Community did all the software work in the beginning and the first OS images that appeared (longsleep's original Arch Linux and Xenial image) were in a very good state. Bugs that have been identified were fixed within hours or days, basic settings were all chosen wisely, even complicated stuff like u-boot/kernel simply upgradeable and improvements like HW accelerated video decoding were immediately made available. But Pine64 folks decided for whatever reasons to not feature good OS images but to use community's work and convert it to crap. The OS images they put online did not contain the already available bug fixes or new features, they used wrong settings and added even more bugs (same MAC address on every board due to /boot/uEnvt.txt containing an address already), they did not even link to the good stuff and did everything to keep users away. And at the same time they also ensured that no progress with the various Pain64 issues could be made due to not providing a real quick start guide, no real FAQ and allowing their forum to be filled with junk and confusion. This was months ago and what now happens there (spreading even more non-sense and now also censoring) seems just entering next level. 1 Quote
pfeerick Posted September 6, 2016 Posted September 6, 2016 I'm sorry... I have stop you there... I'm having trouble not bursting out laughing... nope... I did... you've just summed up what I've been shaking my head over for past few months... lol... settings matter, ignorance hurts... so true... so true... ok, that makes sense... I'd gathered from what you'd said before that cqufreq played a big part here (which is understandable in the world of ARM, since its all on the one chip, rather than across different chips like on a x86 platform...). It would be interesting to see what they stuffed up in the debian distro though that still bottlenecks the download speed to 9MB/s instead of pushing it up to the 23MB/s that armbian proves the board is easily capable of doing. Don't think I can be really bothered finding out for them since the simple solution is 'use Armbian'... it is well supported and works! So, I should add iostat 5 to my video display/log/whatever... wildo... more stats the better... harder to argue against documented empirical evidence! So the iozone results... is that indicating 32.9MB/s write and 33.8MB/s read for 4kb blocks? thus meaning there is something still blocking for network downloads (if at 23MB/s)... which seems understandable since the Pine64 didn't really seem to throttle up for that side... Oh, if you want a real hoot... I initially tried the vanilla image before realising the error of my ways... and since I couldn't seem to mount the usb drive... I instead shared out a folder of the microSD... and got a whopping 3MB/s up and down... needless to say, I decided that wasn't very practical btw, is the wireless module supported (yet? or likely to be?)... Or am I just a bit too slow on working out how to enable it? 1 Quote
tkaiser Posted September 6, 2016 Author Posted September 6, 2016 So the iozone results... is that indicating 32.9MB/s write and 33.8MB/s read for 4kb blocks? thus meaning there is something still blocking for network downloads (if at 23MB/s)... This was just to check whether your thumb drive is the bottleneck or the USB 2.0 port (BTW: the lower USB port is a real USB host port while the upper is the OTG port working in host mode. Performance on the upper port might therefore be lower, to be confirmed). As already said Samba's defaults might not be that great, you could try to add this to global section of smb.conf: aio read size = 16384 aio write size = 16384 use sendfile = yes And that to /etc/rc.local (or better a script called from there): echo 32768 > /proc/sys/net/core/rps_sock_flow_entries echo 32768 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus sysctl -w net.core.rmem_max=26214400 sysctl -w net.core.wmem_max=26214400 sysctl -w net.core.rmem_default=514400 sysctl -w net.core.wmem_default=514400 sysctl -w net.ipv4.tcp_rmem='10240 87380 26214400' sysctl -w net.ipv4.tcp_wmem='10240 87380 26214400' sysctl -w net.ipv4.udp_rmem_min=131072 sysctl -w net.ipv4.udp_wmem_min=131072 sysctl -w net.ipv4.tcp_timestamps=1 sysctl -w net.ipv4.tcp_window_scaling=1 sysctl -w net.ipv4.tcp_sack=1 sysctl -w net.core.optmem_max=65535 sysctl -w net.core.netdev_max_backlog=5000 (changing any of these parameters requires restart of smbd daemon or the whole board). Maybe then read performance also increases. You could also try to eliminate the storage bottleneck by using btrfs with transparent file compression and a test file that is highly compressible (for example consisting of only zeros, the fastest way might be an 'dd if=/dev/zero of=1gb_zeroes.file bs=1G count=1' on the freshly mounted compressed btrfs fs). Might require installation of btrfs-tools package to use mkfs.btrfs and then you would use 'mount -v -t btrfs -o compress=lzo,noatime' to mount the fs (check then output of 'cat /etc/mtab' to get what you would add to fstab). But such a test is only of limited use since it might show only how fast the GbE interface might be when the USB bottleneck could be removed. But btrfs should only be used with a very recent kernel version (since all the btrfs code is contained in the kernel this is the only way to get countless bugfixes) and the whole approach doesn't make that much sense due to Pine64 being limited by USB 2.0 anyway. Regarding WiFi: No idea at all (I don't use WiFi on Linux or 2.4GHz band in general since it's too overcrowded here) Update: I learned a few minutes ago from a FreeBSD dev that Pine64 can be configured so that both USB ports are true USB2.0 host ports using each an own PHY: https://irclog.whitequark.org/linux-sunxi/2016-09-06#17478535; (that's great news since this means once Linux mainline kernel is ready regarding USB and we can set the 'magical bit' there to switch from OTG port in host role to true host mode two independent disks can be connected each maxing out at 40 MB/s with UASP capable disks. This combined with btrfs + transparent file compression and the good GbE networking performance and Pine64 is back in the NAS game!) 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.