David

Members

View Profile See their activity

Posts
0
Joined
May 12, 2016
Last visited
March 28, 2018

Reputation Activity

David reacted to tkaiser in Allwinner R40 -- some already available information October 14, 2016

Great news! R40 does not contain a self-destruction feature that destroys the SoC after 3 weeks: http://forum.banana-pi.org/t/bpi-m2-ultra-test-it-running-22-days-with-ubuntu-16-04-1-lts/2351

(is it really newsworthy that Linux on a device has an uptime of more than 3 weeks? Or does it tell something about @sinovoip's expectations? We've several Armbian installations with +200 days uptime that are only due to kernel updates and the necessary reboots)

Any real information missing as expected (how hot gets the SoC for example when idle or when running cpuburn-a7, why is average load that high? Is it just @sinovoip forgetting USB OTG settings as with every other device they released so far so that it takes community to complain maybe 20 times until @sinovoip fixes stuff like this months later?). Unfortunately it still applies what I wrote above:
David reacted to wildcat_paris in [TEST] Team testers? October 1, 2016

I can only agree.

UUID will be very helpful when testing to boot a new image on SDcard while having a working system on eMMC (where you cannot remove the eMMC flash memory -- I only know hardkernel doing removable eMMC)
David reacted to Igor in [TEST] Team testers? October 1, 2016

Yes, we could (formally) engage more people into developing process as testers. It could be helpful and we would be little relieved.

Perhaps we start to seek & assign few testers / maintainer for each board? We can provide access to daily upgrades to make things easy on technical level. I can provide extra repository (aptdev.armbian.com) with daily updated deb packages, while creating images takes way too much time for daily builds. I am already working on this for some time and it's not far away from running it daily.

Subforum "Development" is not that busy, so we can have those things here.
David reacted to wildcat_paris in [TEST] Team testers? October 1, 2016

thinking of http://forum.armbian.com/index.php/topic/2043-lamobo-r1-520-swconfig-problem-with-last-520-armbian-update/?p=15773

We may ask the forum users and create a list of testers for each board (or family of boards)

=> we could create a sub forum Armbian/Tech/Dev/Armbian build framework/Testing/
- to call for testers when needed (testers would register a pined thread managed by Igor/Mikhail/Thomas/others)
- to report testing on the forum using a title like [5.20/Lamobo-R1] "Status" (OK/KO/minor issues)
- create bug reports in appropriate forum

each registered testers creates its minimum set of test (like testing the switch config of Lamobo-R1) and create a thread as a reference

(just an idea)
David reacted to Igor in Why is there no *real* download verification method? October 1, 2016

Builds can be reproducible but they might not be exactly since we are attached to external source projects and we are fixing things daily.

Sources are more or less trusted (mainline kernel and uboot, legacy sources maintained by vendors, some community, some mixed, ...). They are git cloned with https, so nothing gets in between.

Each developer has it's own environment and our common end result is public on Github. Code is review upon commits so I would say yes. Official build is (so far) always done by me on a dedicated hardware, accessible locally and output is securely transferred to download server which will get ssl when admin of the server gets up and finds time.

Armbian project is still a small one no matter of importance. Our core team resources and budget are tiny, on a hobby level, so even some things are possible, we might not be able to afford them. But we take security seriously and possible problems will be fixed ASAP ... just this ASAP will take longer

Sometimes is hard to see even most obvious things so I only thank you for bringing ideas up. I certainly don't want that our work becomes involved in something like this
David reacted to tkaiser in Armbian on MiQi SBC hardware ? September 17, 2016

There is the fan header, Zador found that out by looking at schematic: http://forum.armbian.com/index.php/topic/1095-miqi-is-a-35-single-board-computer-with-rockchip-rk3288/?p=8338

I'm pretty impressed by the performance (and possible tweaks, see at the bottom of this page) but believe it would be necessary to come up with a fully blown desktop Armbian image since here the board seems to perform pretty well (GPU and video acceleration). For headless use cases the lack of IO bandwidth could be a problem. Anyway, I would appreciate if Peter starts with that and would assume Benn Huang could be asked to send out a few more developer samples.
David reacted to tkaiser in Most suitable Web Browser September 13, 2016

Care to accept the real issue? Random I/O? 'Browsing the web' with Firefox means opening a bunch of databases that constantly sync their contents to 'disk'. If disk is an average SD card (looooooow random IO performance) then this will be slow as hell.

Further readings:
At first, I ran Ubuntu from a cheap SD Card, and the desktop was rather sluggish, but I still decided to launch Libre Office 3 Writerâ€¦ and it took over 5 minutes! So I decided to copy the Ubuntu image to the eMMC module. Ubuntu become more responsive, and Libre Office 3 Writer launched within 20 to 45 seconds the few times I tested A google search for 'places.sqlite' https://www.reddit.com/r/firefox/comments/2379lw/minimizing_firefox_disk_io/ https://wiki.mozilla.org/Performance/Avoid_SQLite_In_Your_Next_Firefox_Feature 'Desktop Linux' means you need fast random IO. That's why you can use ODROID-C1/C1+/C2 as a desktop replacement when you ordered the eMMC but not when you saved the costs and try the same running on the average horribly slow SD card.

Apart from that regarding H3 and any sort of 'acceleration' so many are asking for: AFAIK at least Chromium could use OpenGLES acceleration but would need a higher Mali driver version than what is available. We have some sort of 2D acceleration but nothing else. So the other key performance factor with a Linux desktop environemnt is to minimize as much effects as possible (especially everything that involves transparency).

BTW: You could 'H3' also replace with 'A64'. CPU cores are a bit faster but problems the same.
David reacted to tkaiser in Armbian running on Pine64 (and other A64/H5 devices) September 4, 2016

Now the fun with A64 begins: Olimex posted an update on A64-OLinuXino-eMMC: https://olimex.wordpress.com/2016/09/01/a64-olinuxino-emmc-rev-b-oshw-64-bit-arm-development-board-prototypes-are-testing/

We get bootable SPI flash on the lower PCB side (and now I really believe we get the same with Orange Pi PC 2 and 3), eMMC with fast modes and voltage switching, most probably a cost-down variant without eMMC (otherwise adding SPI flash would not make that much sense), less power hungry DDR3L, an Gigabit Ethernet PHY available in industrial temperature range and a few more tweaks

From a software point of view SPI flash booting and eMMC with voltage switching needs some work and maybe tweaks for the different GbE PHY. But apart from that we should already be there...
David reacted to tkaiser in New Oranges with H5 and H2+ September 2, 2016

Most likely it's 40nm as with A64 now. Allwinner's dual-core A7 (A20) is made in 55nm, the quad-core A7 (H3) in 40nm. The former does not overheat (that much), the latter does. Allwinner's octa-core A7 (A83T/H8/R58 and the 4 other names it has that I already forgot) is made in 28nm and overheats like hell. It's really not that simple

I would suppose H5 is more or less like A64 internally as long as CPU cores are affected (it's said to contain a better GPU and video engines). But unlike A64 that comes with PMIC support where we're able to adjust VDD_CPUX in 20mV steps, H5 does not.

That's one of the open questions: How does voltage regulation looks like? Since when it's done like with H3 (board vendor can choose between 3 options) then we might end up just like with H3: after resolving the overvoltage mess we got some boards that do not overheat that much (the bigger Oranges), some that overheat little (smaller Oranges), some that overheat even more (NanoPi M1) and some that perform here really bad (NEO, Beelink X2 and Banana Pi M2+)

So at the time of this writing it's only speculation.
David reacted to Klym in Marriage between A20 and H3, UPS mode, sunxi-pio utility August 31, 2016

Banana PI M1+ is also working fine with battery backup:

2x18650 3400mAh cells. ~5 hours with always-on 4TB HDD
David reacted to tkaiser in SBC consumption/performance comparisons August 31, 2016

LOL, today I did some testing with NanoPi NEO, kernel 4.7.2 and the new schedutil cpufreq scheduler. I let the following run to check thermal readouts after allowing 1200 MHz max cpufreq:
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$(grep -c '^processor' /proc/cpuinfo) To my surprise the result was just 117.5 seconds -- that's 'better' than RPi 3 with same settings and with Orange Pi PC while being clocked higher (1.3 GHz vs. 1.2 GHz) I got the following a few days ago: 'sysbench takes 142 seconds, H3 constantly running at 1296 MHz, SoC temperature reached 74Â°C but no throttling happening'

Wow!!! An increase in performance of ~30 percent just by using a new kernel! With a benchmark that should not be affected by the kernel version at all?! That's magic.
So I immediately tried out our 3.4.112 Xenial image. Same thermal readouts, same result: 117.5 seconds! What did happen? I tried out Xenial 16.04 LTS with both 4.7.2 and 3.4.112 kernel. And before I always used Debian Jessie. Ok, downloaded our Jessie image for NanoPi NEO, executed the same sysbench call and got 153.5 seconds (which is the correct value given that no throttling occured, max cpufreq was at 1200 MHz and OPi PC clocked at 1296 MHz finishes in 142 seconds!) What can we learn from this? Sysbench is used nearly everywhere to 'get an idea about CPU performance' while it is horrible crap to compare different systems! You always have to ensure that you're using the very same sysbench binary. At least it has to be built with the exact same compiler settings and version! We get a whopping 30 percent performance increase just since the Ubuntu folks use other compiler switches/version compared to the Debian folks: This is 2 times 'sysbench 0.4.12' Ubuntu Xenial Xerus: root@nanopineo:~# file /usr/bin/sysbench /usr/bin/sysbench: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=2df715a7bcb84cb03205fa3a5bc8474c6be1eac2, stripped root@nanopineo:~# lsb_release -c Codename: xenial root@nanopineo:~# sysbench --version sysbench 0.4.12 Debian Jessie:
root@nanopineo:~# file /usr/bin/sysbench /usr/bin/sysbench: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=664005ab6bf45166f9882338db01b59750af0447, stripped root@nanopineo:~# lsb_release -c Codename: jessie root@nanopineo:~# sysbench --version sysbench 0.4.12 Just the same effect when comparing sysbench numbers on RPi 2 or 3 when running Raspbian or Ubuntu Mate -- see post #12 above (but there the difference is only 15 percent so it seems either the Raspbian people aren't using that conservative compiler switches compared to Jessie or Ubuntu Mate for Raspberries does not optimize that much as our 16.04 packages from Ubuntu repositories)

TL;DR: Never trust any sysbench numbers you find on the net if you don't know which compiler settings and version have been used. Sysbench is crap to compare different systems. You can use sysbench's cpu test only for a very limited set of tests: Creating identical CPU utilization situations (to compare throttling settings as I did before in this thread), running tests to estimate multi-threaded results when adding/removing CPU cores or test CPU performance without tampering results by memory bandwidth (sysbench is that primitive that all code runs inside the CPU caches!)

Everything else always requires to use the exact same sysbench binary on different systems to be able to compare. So no-cross platform comparisons possible, no comparisons between systems running different OS images, no comparisons between different CPU architectures possible. Using sysbench as a general purpose CPU benchmark is always just fooling yourself!
David reacted to tkaiser in Some storage benchmarks on SBCs August 31, 2016

Since I've seen some really weird disk/IO benchmarks made on SBCs the last days and both a new SBC and a new SSD arrived in the meantime I thought let's give it a try with a slightly better test setup.
I tested with 4 different SoC/SBC: NanoPi M3 with an octa-core S5P6818 Samsung/Nexell SoC, ODROID-C2 featuring a quad-core Amlogic S905, Orange Pi PC with a quad-core Allwinner H3 and an old Banana Pi Pro with a dual-core A20. The device considered the slowest (dual-core A20 with just 960 MHz clockspeed) is the fastest in reality when it's about disk IO. Since most if not all storage 'benchmarks' for SBC moronically focus on sequential transfer speeds only and completely forget that random IO is way more important on any SBC (it's not a digital camera or video recorder!) I tested this also. Since it's also somewhat moronically when you want to test the storage implementation of a computer to choose a disk that is limited the main test device is a brand new Samsung SSD 750 EVO 120GB I tested first on a PC whether the SSD is ok and to get a baseline what to expect. Since NanoPi M3, ODROID-C2 and Orange Pi PC only feature USB 2.0 I tested with 2 different USB enclosures that are known to be USB Attached SCSI (UAS) capable. The nice thing with UAS is that while it's an optional USB feature that came together with USB 3.0 we can use it with more recent sunxi SoCs also when running mainline kernel (A20, H3, A64 -- all only USB 2.0 capable). When clicking on the link you can also see how different USB enclosures (to be more precise: the included USB-to-SATA bridges used) perform. Keep that in mind when you see 'disk performance' numbers somewhere and people write SBC A would be 2MB/s faster than SBC B -- for the variation in numbers not only the SBC might be responsible but this is for sure also influenced by both the disk used and enclosure / USB-SATA bridge inside! The same applies to the kernel the SBC is running. So never trust in any numbers you find on the internet that are the results of tests at different times, with different disks or different enclosures. The numbers presented are just BS. The two enclosures I tested with are equipped with JMicron JMS567 and ASMedia ASM1153. With sunxi SBCs running mainline kernel UAS will be used, with other SoCs/SBC or running legacy kernels it will be USB Mass Storage instead. Banana Pi Pro is an exception since its SoC features true SATA (with limited sequential write speeds) which will outperform every USB implementation. And I also used a rather fast SD card and also a normal HDD with this device connected to USB with a non UASP capable disk enclosure to show how badly this affects the important performance factors (again: random IO!) I used iozone with 3 different runs: 1 MB test size with 1k, 2k and 4k record sizes 100 MB test size with 4k, 16k, 512k, 1024k and 16384k (16 MB) record sizes 4 GB test size with 4k and 1024k record sizes The variation in results is interesting. If 4K results between 1 and 100 MB test size differ you know that your benchmark is not testing disk througput but instead the (pretty small) disk cache. Using 4GB for sequential transfer speeds ensures that the whole amount of data exceeds DRAM size. The results: NanoPi M3 @ 1400 MHz / 3.4.39-s5p6818 / jessie / USB Mass Storage: Sequential transfer speeds with USB: 30MB/s with 1MB record size and just 7.5MB/s at 4K/100MB, lowest random IO numbers of all. All USB ports are behind an USB hub and it's already known that performance on the USB OTG port is higher. Unfortunately my SSD with both enclosures prevented negotiating an USB connection on the OTG port since each time I connected the SSD the following happened: WARN::dwc_otg_hcd_hub_control:2544: Overcurrent change detected ) ODROID-C2 @ 1536 MHz / 3.14.74-odroidc2 / xenial / USB Mass Storage: Sequential transfer speeds with USB: ~39MB/s with 1MB record size and ~10.5MB/s at 4K/100MB. All USB ports are behind an USB hub and the performance numbers look like there's always some buffering involved (not true disk test but kernel's caches involved partially) Orange Pi PC @ 1296 MHz / 4.7.2-sun8i / xenial / UAS: Sequential transfer speeds with USB: ~40MB/s with 1MB record size and ~9MB/s at 4K/100MB, best random IO with very small files. All USB ports are independant (just like on Orange Pi Plus 2E where identical results will be achieved since same SoC and same settings when running Armbian) Banana Pi Pro @ 960 MHz / 4.6.3-sunxi / xenial / SATA-SSD vs. USB-HDD: This test setup is totally different since the SSD will be connected through SATA and I use a normal HDD in an UAS incapable disk enclosure to show how huge the performance differences are. SATA sequential transfer speeds are unbalanced for still unknown reasons: write/read ~40/170MB/s with 1MB record size, 16/44MB/s with 4K/100MB (that's huge compared to all the USB numbers above!). Best random IO numbers (magnitudes faster since no USB-to-SATA bottleneck as with every USB disk is present). The HDD test shows the worst numbers: Just 29MB/s sequential speeds at 1MB record size and only ~5MB/s with 4K/100MB. Also the huge difference between the tests with 1MB vs. 100MB data size with 4K record size shows clearly that with 1MB test size only the HDD's internal DRAM cache has been tested (no disk involved): this was not a disk test but a disk cache test only. Lessons to learn? HDDs are slow. Even that slow that they are the bottleneck and invalidate every performance test when you want to test the performance of the host (the SBC in question) With HDDs data size matters since you get different results depending on whether the benchmark runs inside the HDD's internal caches or not. SSDs behave here differently since they do not contain ultra-slow rotating platters but their different types of internal storage (DRAM cache and flash) do not perform that different When you have both USB and SATA not using the latter is almost all the time simply stupid (even if sequential write performance looks identical. Sequential read speeds are way higher, random IO will always be superiour and this is more important) It always depends on the use case in question. Imagine you want to set up a lightweight web server dealing with static contents on any SBC that features only USB. Most of the accessed files are rather small especially when you configure your web server to deliver all content already pre-compressed. So if you compare random reads with 4k and 16k record size and 100MB data size you'll notice that a good SD card will perform magnitudes faster! For small files (4k) it's ~110 IOPS (447 KB/s) vs. 1950 IOPS (7812 KB/s) so SD card is ~18 times faster, for 16k size it's ~110 IOPS (1716 KB/s) vs. 830 IOPS (13329 KB/s) so SD card is still 7.5 times faster than USB disk. File size has to reach 512K to let USB disk perform as good as the SD card! Please note that I used a Samsung Pro 64GB for this test. The cheaper EVO/EVO+ with 32 and 64GB show identical sequential transfer speeds while being a lot faster when it's about random IO with small files. So you save money and get better performance by choosing the cards that look worse by specs! Record size always matters. Most fs accesses on an SBC are not large data that will be streamed but small chunks of randomly read/written data. Therefore check random IO results with small record sizes since this is important and have a look at the comparison of 1MB vs. 100 MB data size to get the idea when you're only testing your disk's caches and when your disk in reality. If you compare random IO numbers from crap SD cards (Kingston, noname, Verbatim, noname, PNY, noname, Intenso, noname and so on) with the results above then even the slow HDD connected through USB can shine. But better SD cards exist and some pretty fast eMMC implementations on some boards (ODROID-C2 being the best performer here). By comparing with the SSD results you get the idea how to improve performance when your workload depends on that (desktop Linux, web server, database server). Even a simple 'apt-get upgrade' when done after months without upgrades heavily depends on fast random IO (especially writes). So by relying on the usual bullshit benchmarks only showing sequential transfer speeds a HDD (30 MB/s) and a SD card (23 MB/s) seem to perform nearly identical while in reality the way more important random IO performance might differ a lot. And this solely depends on the SD card you bought and not on the SBC you use! For many server use cases when small file accesses happen good SD cards or eMMC will be magnitudes faster than HDDs (again, it's mostly about random IO and not sequential transfer speeds). I personally used/tested SD cards that show only 37 KB/s when running the 16K random write test (some cheap Intenso crap). Compared to the same test when combining A20 with a SATA SSD this is 'just' over 800 times slower (31000 KB/s). Compared to the best performer we currently know (EVO/EVO+ with 32/64GB) this is still 325 times slower (12000 KB/s). And this speed difference (again: random IO) will be responsible for an 'apt-get upgrade' with 200 packages on the Intenso card taking hours while finishing in less than a minute on the SATA disk and in 2 minutes with the good Samsung cards given your Internet connection is fast enough.
David reacted to tkaiser in Armbian running on Pine64 (and other A64/H5 devices) August 30, 2016

A little update regarding 'network performance' since people wrote me the results with our Armbian/Xenial image look too good to be true.

So back to the basics: most benchmarks you find somewhere on the internet regarding SBC performance are showing numbers without meaning since they were made in 'passive benchmarking' mode without taking care of what's important.
What's different on SBCs compared to real servers (be it x86, Sparc, MIPS, ARMv8)? Network and IO performance on cheap ARM SoCs are affected by current CPU clockspeed (that's different on 'real' servers) A benchmarking tool that adds to CPU utilization on a real server in a negligible fashion might max out CPU ressources on a weak ARM SoC so the tool used for benchmarking might bastardize performance numbers itself. Also when acting single-threaded it might in reality test CPU and not network -- true for iperf in most operation modes The OS distribution might use horribly wrong settings (cpufreq governor, IRQ distribution and so on), might contain mechanisms that are counterproductive (screensavers that start after periods of inactivity and influence performance numbers massively) and might be optimized for different use cases (for example on any ARM SoC around CPU and GPU cores share access to DRAM so we already know that by disabling GPU/HDMI on headless servers we automagically improve performance due to less consumption/temperature now leading to better throttling behaviour under load and more memory bandwidth leading to higher throughput numbers for some tasks) Since we already know that common tools like iperf are CPU intensive let's try out the upper and lower clockspeed on Pine64 to get the idea how a wrong cpufreq governor might influence results (switching way too slow from lower to upper clockspeeds when starting short benchmark executions) and how bottlenecked iperf is by CPU anyway. Results as follows (please remember: this is always the same hardware and test setup, the only real difference is the OS image used!): TX RX Armbian Xenial @ 480 MHz: 630 / 940 Mbits/sec Armbian Jessie @ 480 MHz: 620 / 600 Mbits/sec pine64.pro Jessie @ 480 MHz: 410 / 595 Mbits/sec TX RX Armbian Xenial @ 1152 MHz: 920 / 940 Mbits/sec Armbian Jessie @ 1152 MHz: 920 / 810 Mbits/sec pine64.pro Jessie @ 1152 MHz: 740 / 770 Mbits/sec What does this mean? Obviously the OS image matters. Someone wanting to benchmark Pine64+ and relying on the OS images from their official download location gets 500 Mbits/sec on average while someone choosing our Xenial image gets 930 Mbits/sec on average.

So let's switch from passive to active benchmarking mode (monitoring the benchmark itself) and check what's different:
When using Xenial's iperf the tool acts single-threaded only in one direction (TX), with the iperf version in Jessie it's single-threaded in both directions. So by using Jessie you ensure that CPU clockspeed tampers network throughput always in both directions (iperf maxing out one CPU core at 100%) while with Xenial that happens only in TX direction. With Xenial you would also see full 940 Mbits/sec in both directions by adjusting maximum cpufreq from 1152 to 1200 MHz. The performance numbers also show that compiler switches do not affect iperf performance that much when comparing Xenial with Jessie (with other tools like sysbench you get a whopping 30 percent better numbers on Xenial compared to Jessie) The differences between Armbian Jessie and the one from pine64.pro are: my Jessie image is headless while pine64.pro runs a desktop and more importantly that the pine64.pro OS image uses the wrong cpufreq governor (ondemand instead of interactive) which affects standard iperf/iperf3 test execution using the 10s defaults. When using longer test executions benchmark numbers improve but this is obviously the wrong way to handle the problem -- switching to interactive is the real solution With iperf in single-threaded mode you will also see performance differences caused by CPU affinity. If for whatever reasons the kernel decides to assign the iperf task to cpu0 performance numbers might differ compared to running on cpu1 (depends on IRQ distribution). So to get really a clue what's going on you have to monitor IRQ distribution (/proc/interrupts) and assign the benchmark tool to a specific core using taskset to get the idea whether tool and IRQ handling on the same core improve performance or not (since then you could decide whether improving IRQ distribution is worth a try) What do these numbers tell us?

iperf/iperf3 are both not able to measure network througput realiably on SBCs since they are too much CPU bound (as used by most people using default window sizes the results are useless anyway) on the other hand that means that actual CPU clockspeeds matters a lot and therefore choosing an inappropriate cpufreq governor results in worse performance (simple solution: switch to performance or interactive with Allwinner's BSP kernel or schedutil on mainline kernel 4.7 or later) Also inapproriate heat dissipation leads to benchmarks showing lower network performance so in case you cramped your Pine64 in a tiny enclosure without using a heatsink you ensure that performance sucks iperf seems to perform better when using Xenial since it behaves differently there (maybe caused by Jessie compiling distro packages with GCC 4.9 while Xenial uses GCC 5.4 instead). So you get better performance numbers by switching from Jessie to Xenial but it's important to understand that this ONLY affects meaningless benchmark numbers! Real world workloads behave differently. Don't trust in any benchmark blindly iperf when used with default settings (that's 10 seconds test execution here) might show weird/random numbers since we have to deal with a a phenomenon called TX/RX delay and cpufreq scaling governor chosen adds to random behaviour. With iperf you should always test 5 times with 10 seconds, then 1 x 60 seconds and 1 x 300 seconds since then you immediately get the idea why ondemand is wrong for this workload and how the other phenomenon might influence results. Use iperf3 also since this outputs 1 second statistics and might report different numbers (so you learn at least to not blindly trust into benchmark tools!) Most important lesson: Try to understand how these iperf/iperf3 numbers correlate with reality and then measure/monitor real world workloads. If you let Pine64 run as a web server for example with Jessie from pine64.pro then with light workloads Pine64 will be magnitudes slower than when using Armbian (remaining at 480 MHz vs. jumping to 1152 MHz when needed)... or switch to interactive governor with the pine64.pro image. The numbers above do not tell you this real difference for this specific use case. It's always important to try to understand what a benchmark actually tells you We at Armbian should ask ourselves whether we will provide desktop images for Pine64 at all (please, please not, let's prevent that our forum gets flooded with all the HDMI and DVI issues and 'why does firefox not play youtube videos?' and crap like that) and whether we make our CLI images 'true headless' (disabling GPU/HDMI like we did with H3 boards since both SoCs are pretty identical in this regard and I would assume that this helps with specific server workloads a lot due to more memory bandwidth, more useable RAM and better throttling behaviour)
David reacted to tkaiser in Armbian running on Pine64 (and other A64/H5 devices) August 30, 2016

We have initial support for Pine64/Pine64+ for a long time in our repository but not released any official images yet. Since this will change soon a sneak preview what to expect.

Hardware related issues:

Please don't blame Armbian for the few design flaws Pine64 and Pine64+ show:
These boards use Micro USB for DC-IN which is the worst possible decision. Most USB cables have a resistance way too high and are responsible for severe voltage drops when consumption increases, then the tiny Micro USB contacts have also a pretty high contact resistance and also maximum amperage for this connector is limited to 1.8A by USB specs. So in case you want to do heavy stuff immediately look into linux-sunxi wiki page for Pine64 to get the idea how to use the pins on the so called Euler connector to power the board more reliably. If you think about buying a Pine now consider ordering their PSU too since there cable resistance shouldn't be a problem (should also apply to the Micro USB cables they sell) The only led on this board is a power led that immediately lights when power is provided. Pre-production samples had a green led, on the normal batches this has been replaced with a red led. So there's no way for an OS image to provide user feedback (activate an led when u-boot or kernel boots) and the red light has often been interpreted as 'something is wrong' USB: you find 2 USB type A receptacles on the board but only one is a true USB host port, the other/upper is A64's USB OTG port exposed not as Mini/Micro USB (with ID pin to be able to switch roles) but as a normal type A port. Expect performance to be lower on this port. I've also never been able to do disk benchmarking on the upper port but that might have changed in the meantime (I only have a pre-production developer sample here). Please note also that the maximum amperage available on the USB port is 650mA so connecting bus-powered USB disks might already exceed this so be prepared to use a powered USB hub in between A64 is prone to overheating but unfortunately the Pine64 folks do not sell the board with an effective heatsink by default (compare with ODROID-C1+ or ODROID-C2 for example how it looks like if the vendor cares about heat dissipation). They promised to provide a good heatsink as option but at least I'm not able to find one in their online store. But a heatsink is mandatory if you plan to run this device constantly with high loads, otherwise throttling will occur (when we tested an unrealistic heavy workload without a heatsink -- cpuburn-a53 -- A64 had to throttle down to as less as 600 MHz (for some numbers see IRC log from a while ago) Not a real hardware issue but a problem anyway: the HDMI driver in Allwinner's BSP does not negotiate any display output with a lot of displays that are connected with a HDMI <--> DVI converter or use non-common resolutions. Better do not expect any display output if your display is neither connected directly using HDMI nor capable of 1080p (we can't do anything here since Allwinner's driver uses closed source blobs and no documentation or code with useable license exists) On a couple of Gbit equipped Pine64+ users report that they're not able to negotiate Gbit Ethernet reliably and have to force the connection to Fast Ethernet (since we know that the RTL8211E PHY used on the boards needs an additional ~350 mW when negotiating a Gbit Ethernet connection this might be related to power problems or maybe different PHY batches or something else). Confirmed in the meantime to be a hardware issue. Now combine Micro USB encouraging users to combine this SBC with crappy phone chargers, 'smart' hubs/chargers that do only provide 500mA since Pine64 isn't able to ask for more and crappy USB cables leading to voltage drops (all sorts of power related issues 'by design' due to crappy Micro USB connector) with a missing custom led able to be used to provide user feedback while booting and the inability to use a lot of displays then you might already get what a support nightmare this device is.

The only reliable DOA detection method without a serial console is to ensure you have a working SD card (test it before using either F3 or H2testw as outlined in our docs), then check download integrity of the Armbian image (again see the documentation), then ensure you burn the image correctly to SD card (see docs), insert SD card, power on the board and wait 20 seconds. If then the leds on the Ethernet jack start to flash randomly at least the kernel boots and after waiting an additional 2 minutes you'll be able to login with SSH or serial console (for the latter better choose the EXP header over the Euler connector -- reason here)

Anyway: In case you run in booting or stability problems with Armbian on Pine64/Pine64+ be assured that it's not an Armbian issue. You run into any of the problems above therefore please try to resolve them on your own and send your complaints to Pine64 forum and not ours: http://forum.pine64.org/forumdisplay.php?fid=21 (really, we don't do hardware and these issues are all related to hardware design decisions)

Expectations:

The Pine64 folks did a great job raising expectations to the maximum. They advertised this board as 'first $15 64-Bit Single Board Super Computer', promised an average consumption of just 2.5W, the SoC remaining at 32Â°C and a few other weird things while they already knew that reality differs a lot (the journey started here last Dec).

Pine64 is not a 'Super Computer' but most probably the slowest 64-bit ARM board around due to A64 being limited regarding maximum cpufreq and overheating issues (40nm process being responsible for) and lack of fast IO interconnections (only one real USB 2.0 host port present, no eMMC option possible, no SD card implementation using the faster modes). If you then combine the high expectations with a rather clueless kickstarter crowd (many of them not even getting that they did not buy products but backed a project) and the hardware flaws it's pretty obvious why their forums are full of complaints and why they receive so much boards as being DOA that work flawlessly in reality.

So why bringing Armbian to Pine64? Since for some (headless) use cases these boards are really nice and also cheap, A64 support is progressing nicely thanks to our awesome linux-sunxi community and also a few more A64 devices will be available soon.

What do you get with Armbian on Pine64?

User experience will not be much different compared to longsleep's minimal Ubuntu image. If you prefer Debian then at least you can be assured that our images do not contain bad settings and silly bugs like the one's from official Pine64 downloads section (since they fiddle around manually with their OS images for example all Pine boards running these have the same MAC address by default which will cause network troubles if you've more than one board in the same collision domain).

We use the same thermal/throttling settings like OS images based on longsleep's kernel (since we helped developing them back in March), we use the same BSP kernel (patched by Zador up to the most recent version back in May) and share a few more similarities since our modifications were sent back to longsleep so all OS images for Pine64 might be able to benefit from.

Differences: You don't need to execute longsleep's various platform scripts since kernel and u-boot updates are done using the usual apt-get upgrade mechanism in Armbian. You also don't need (and should not use) scripts like pine64_tune_network.sh since they decrease network performance with Armbian (stay with our defaults unless you're an expert). And a few more tweaks might result in better performance and at least by using Armbian you have the usual Armbian experience with some additional tools at the usual location, automatic fs resize on first boot and so on.

We already provide a vanilla image currently based on kernel 4.7 but that's stuff for developers only, see below.

Performance with legacy Armbian image:

'Out of the box' CPU performance with A64 is not that great unless you are able to benefit from the new CPU features: A64 uses Cortex-A53 CPU cores that feature 64-bit capabilities (which are not that interesting since A64 devices are limited to 2 GB DRAM anyway at the moment) but more interestingly ARMv8 instruction set can be used which might increase performance a lot when software will be compiled for this platform. Best example: the commonly mis-used sysbench cpu test: When running an ARMv6 'optimized' sysbench binary on an ARMv8 CPU then performance will be 15 times slower than necessary (applies to the RPi 3 or the upcoming Banana Pi M64 when used with their OS images)

But as soon as ARMv8 optimized code is used A64 can really shine in some areas. I used the default sysbench contained in Ubuntu Xenial's arm64 version, tried it with 20000 settings and got less than 8 seconds execution time (an RPi 3 running Raspbian has the faster CPU cores but here it will take 120 seconds -- just due to different compiler switches!). Then I tried whether I can optimize performance building sysbench from source using
export AM_CFLAGS="-march=armv8-a -mtune=cortex-a53" and got 11 seconds execution time, so optimized code led to a huge performance loss? Not really, I checked out sysbench version 0.5 by accident and there for whatever reasons execution with ARMv8 optimization or in general takes longer (great! benchmark version influences execution time, so one more reason to never trust in sysbench numbers found on the net!). Using the '0.4' branch at version 0.4.12 I got an execution time of less than 7.5 seconds which is a 10 percent increase in performance for free just by using appropriate compiler flags:

Another great example how using CPU features or not (NEON in this case) influences performance and 'benchmarking gone wrong' numbers are Linpack's MFLOPS scores. By choosing the package your distro provides instead of using one that makes use of your CPU's features you loose at lot of performance, ruin every performance per watt ratios and behave somewhat strange

Someone sent me Linpack MFLOPS numbers generated with Debian Jessie which is known for horribly conserative compiler settings when building packages -- if you switch your distro from Jessie to Ubuntu Xenial for example you get a 30 percent improvement in sysbench numbers, yeah that's the 'benchmark' we already laughed at above.

With Jessie's/Raspbian's hpcc package, Pine64+ gets a score of 1625 MFLOPS and RPi 3 just 1035. So is Pine64 1.6 times faster than RPi 3? Nope, that's just 'benchmarking gone wrong' since these numbers are the result of a joke: Using tools for 'High performance computing' with standard settings (no one interested in HPC would ever do that). By using the correct Linpack version that makes use of NEON optimizations on both CPUs we end up with 3400 MFLOPS (Pine64 at 1.3 GHz) vs 3600 MFLOPS (RPi 3 at 1.2 GHz).

So if we're talking about this use case (HPC -- high performance computing) RPi 3 easily outperforms A64 (please keep in mind that the 3400 MFLOPS I got are the result of overclocking/overvolting at 1296 MHz, Pine64 is limited to 1152 MHz by default so we're talking about 3000 MFLOPS for A64 vs. 3600 MFLOPS for RPi 3's SoC. So it's not Pine64 being 1.6 times faster but RPi 3 being more suited for Linpack numbers and this type of benchmarks only shows how wrong it is to use distro packages that are built using conservative settings (which is a must if the distro wants to support a wide range of different SoCs!)

Anyway: I's obvious that in case you want to use Pine64 for number crunching or performance stuff in general evaluating whether compiling packages from source might improve performance is a great idea (at least it's obvious that from a performance point of view using an ARMv6 distro with ARMv8 SoCs is stupid -- reality with Raspbian running on RPi 3 and BPi M64). ARMv8 also provides crypto extensions that might be used with OpenSSL for example. Didn't look into it yet but maybe huge performance gains when using a Pine64 as HTTPS enabled web server or VPN endpoint are possible just like we've already seen with sysbench.

Network performance: Pine64+ combines the SoC internal GbE MAC implementation (the same as in H3 and A83T SoCs from Allwinner) with an external RTL8211E PHY as used on most GbE capable SBC. Default iperf performance with Armbian/Xenial: +900 MBits/sec in both directions (920/940 MHz) so no need for further tuning (please read through this explanation here why blindly trusting in iperf numbers is always stupid and why it's neither necessary nor useful to further tune network settings to get better iperf numbers).

Please keep in mind that for yet unknown reasons a couple of Pine64+ are reported to not reliably work at Gbit Ethernet speeds. Please also keep in mind how settings might matter. If you run a standard iperf test in 'passive benchmarking' mode you might get throughput numbers 200-250 Mbits/sec lower than ours maybe just due to a wrong cpufreq governor. Ethernet throughput scales linearly with CPU clockspeed with most cheap ARM SoCs (our only known exception from this is Solid-Run's Clearfog which uses a SoC optimized for IO and network throughput) so by using the ondemand governor with wrong/default settings for example you ensure that an idle SBC will only slowly increase clockspeed when you start your iperf test. This is Armbian switching from interactive to ondemand governor now being below 700 Mbits/sec just due to adjusting CPU clockspeed too slow:

The other stuff normally 'benchmarked' is not worth mentioning/testing it so just as quick notes:
A64 is showing the same SDIO limitation as most other SoCs limiting sequential transer speeds to/from SD card to ~23MB/s (do the math yourself: SDIO with 4 bit @ 50 MHz minus some overhead is 23 MB/s) -- fortunately that's rather uninteresting since random IO matters on SBCs and there it's your choice to choose between crappy cards that horribly suck or follow our recommendations and choose a really fast card. But Pine64 can not use the faster eMMC interface so if you really need high IO bandwidth and high IOPS better choose a different device USB is USB 2.0 so expect ~35MB/s with BSP kernel and ~40MB/s with mainline kernel and UASP capable disk enclosures for individual USB connections (UASP + mainline kernel might show high random IO numbers if used together with an SSD!) HW accelerated video decoding is already possible (see here for the codec matrix) and situation with HW accelerated video encoding looks promising too: http://forum.armbian.com/index.php/topic/1855-ffmpeg-with-cedrus-h264-hw-encoder-a64-cmos-camera/ In case one is interested in performance testing on SBCs monitoring what's happening is mandatory. Currently our armbianmonitor tool does not install the necessary templates on A64 so still my script to install this stuff on A64 should be used: http://kaiser-edv.de/tmp/4U4tkD/install-rpi-monitor-for-a64.sh (read the script's header how to install)

Performance with vanilla Armbian image:

Not interesting at all at the time of this writing since while Pine64 happily boots mainline u-boot/kernel it's way too early to do tests in this area. Currently there's no access to the AXP803 PMIC from mainline kernel so not even VDD_CPUX voltage regulation works and as a result cpufreq scaling is also not working and the SoC is clocked pretty conservative. Since most performance relevant stuff running on cheap ARM SoCs depends on (switching as fast as possible to) high CPU clockspeeds benchmarking is absolutely useless now.

You should also keep in mind that many core features still not work with mainline kernel so this is really stuff for developers (who normally prefer their own way to boot their freshly compiled kernels). So please don't expect that much from vanilla images for A64 boards now, better choose the legacy variant.

The future?

A few more A64 boards are announced or already available as dev samples, for example the aforementioned BPi M64 (possible advantages over Pine64: sane DC-IN, real USB OTG, more USB host ports behind an internal USB hub, eMMC available and custom leds being able to provide user feedback, everything else is more or less the same as the 2 GB Pine64+) or Olimex working on both an SBC and an A64 based Laptop.

And then Xunlong announced 2 new SBC based on Allwinner's H5. H5 (product brief) seems to be A64's bigger sibling providing video/GPU enhancements, 3 true USB host ports in addition to one USB OTG (just like H3 where we can use all 4 USB ports that do not have to share bandwidth), integrating a Fast Ethernet PHY (just like H3) but lacks PMIC support (again just like H3, so no mobile useage, no battery support out of the box and it gets interesting how VDD_CPUX voltage regulation will work there -- maybe 'just like H3' again).

Since A64 shares many/most IP blocks with H3 and A83T from Allwinner I still hope that H5 will be just a mixture of A64 and H3 and we will get full support based on what we now have for these 2 other SoCs pretty fast. But that's 100 percent speculation at this moment

Update regarding longsleep's pine64_tune_network.sh script. Benchmark results don't get automatically worse when applying the tweaks from his script but the result variation gets huge (730 - 950 Mbits/sec, exceeding 940 Mbits/sec is already an indication that buffers are invoked):

So better enjoy defaults unless you really know what you do since network performance tuning works in different directions. Stuff that might increase throughput might negatively affect latency and vice versa. So if you start to tune, tune for your specific use case!
David reacted to zador.blood.stained in [RfC] Make Armbian more IoT friendly? August 21, 2016

Started to write a long wall of text several times, but wasn't happy with the result, so to keep this short:

Making Armbian for selected number of boards and kernel configurations more IoT friendly is possible, but requires some effort to maintain (for mainline kernel).
Compared to RPI we don't have uniform interface for activating and configuring different hardware interfaces, so everything should be done (for legacy kernel) and maintained (for mainline) for each SoC and board/group of similar boards individually.

GPIO group and udev rules should be done as an option in armbianmonitor (since we don't have any GUI/dialog based system configurator) should be provided by default in board support package, users will be able to use this by adding themselves to "gpio" group.
David reacted to tkaiser in [RfC] Make Armbian more IoT friendly? August 21, 2016

Just a thought. Over half of the boards we support feature a 26 or 40 RPi compatible GPIO header and all the other boards expose various protocols and GPIO on various pins and headers.

Users coming from Raspbian and trying out any of our boards expect that basics work the same. While it's both impossible and not desirable to be 100% compatible with Raspbian a few things would make transition from Raspbian to Armbian for IoT folks more easy.

Simple example: Default user pi in Raspbian has access to GPIO pins -- on Armbian it's currently root only. How's that done? A group called gpio exists in Raspbian, pi is added to it and /etc/udev/rules.d/99-com.rules does all the magic:

So adding a detection in automatic user creation (we don't want to ship with pre-defined users ) to check for pi as name we could add this in Armbian too.

Another area are the various WiringPi library variants that exist for different SoCs that partially need adjusted settings since pin mappings differ (see example for BPi M2+). AFAIK the knowledge which lib has to be used on which SoC is present here but scattered all over the place. Wouldn't it be cool if people experienced with this stuff start to collect information (as a first step as part of our documentation, later maybe providing own variants of these libs for our most popular SoC variants)?

Disclaimer: me is the wrong person for this stuff since I'm still an absolute IoT NOOB and always have a hard time trying out GPIO stuff

So this thread is meant to collect ideas and opinions on this.
David reacted to tkaiser in Repository management August 21, 2016

2.11 contains an outdated version of my sunxi-temp-daemon that will be started by rpimonitord: https://github.com/XavierBerger/RPi-Monitor/blob/devel/src/usr/lib/systemd/system/sunxi-temp-daemon.service

When installing 2.11 and our sun8i adoptions (armbianmonitor -r) then RPi-Monitor's /usr/share/rpimonitor/scripts/sunxi-temp-daemon.sh and our /usr/local/sbin/rpimonitor-helper.sh will conflict (at least /tmp/disktemp will cause troubles). Better postpone any RPi-Monitor upgrades until armbianmonitor-daemon is ready (will then take care of stuff like this and disable sunxi-temp-daemon since the whole daemon is useless when RPi-Monitor runs on Armbian).

For the moment having the stable 2.10 in our repo this is the best solution possible.
David reacted to tkaiser in [RfC] Images for "new" boards August 21, 2016

Why not decreasing DRAM clockspeeds for both variants and keep one image? My understanding is that literally no one ever tested DRAM reliability on any Lime2 (Olimex' Tsvetan proudly announced DRAM trace routing would've been improved compared to Lime and 532 MHz are possible based on playing video for a few hours since this is their usual test procedure). On the other hand based on forum discussions the serious Lime2 users are all interested in stability over laughable performance gains (downclock DRAM to 432 MHz and you loose how much percent performance in real world applications? 2? Or even 3?)

Edit: Here it's still 384 MHz, in our fex files it's 384 MHz and I wouldn't be surprised if the very same value is still used by Olimex themselves in their OS image for Lime2. So we currently deal with one 532 MHz claim made by a proud vendor that led to u-boot maintainer choosing 480 MHz (upper limit -- 532 not possible) and now all Lime2 users who rely on mainline u-boot have a problem, right? And everyone thinks this 480 MHz value is something that's the result of reliability testing but in reality it's quite the opposite and users face all sorts of problems.

Edit2: And here we're seeing 384 MHz, there again but there 480 MHz. But we still don't know where the 480 MHz originate from? Reliability testing or Tsvetan simply claiming '532 MHz!'?

Edit3: In mainline u-boot they rely on defaults which are 480 MHz (for whatever reasons) and according to their instructions they use it for both normal and eMMC variant. So we have 532 MHz as a 'it's working that fast!1!!' claim, 480 MHz as the results of trusting in this and 384 MHz as leftover from older Lime settings scattered all over the place. Why do we trust in any of these numbers?

Edit4: And after reading this I'm convinced we're dealing with random numbers without meaning
David reacted to tkaiser in [SOLVED] Orange PI PC and $2 Ralink MT7601U dongle (USB ID 148f:7601) August 17, 2016

No, you used Mediatek's original sources which are known to contain countless bugs. As written above: our previous sources are now available at https://github.com/porjo/mt7601u (repo URL has a trailing u now) and they contain tons of fixes compared to the sources you use now. Just read through the posts above and the commit log: https://github.com/porjo/mt7601u/commits/master

BTW: This other commit also worsens things (enabling audio in on CT and breaking it on all other sunxi boards that use microphone instead)
David reacted to tkaiser in [RfC] Default settings for NanoPi NEO/Air August 17, 2016

Only switching to userspace in linux-sun8i-default.config -- in /etc/default/cpufrequtils we should stay with interactive since all the heavy stuff happens before cpufrequtils daemon will be loaded. This way the 'less than 2W consumption by default' claim is true and the only drawback is an average delay of 1.3 seconds on every boot.

If then a user wants to further tune settings he is free to do so by altering /etc/default/cpufrequtils and/or script.bin together with /etc/rc.local to disable CPU cores. But this way we retain performance behaviour (lowering maximum performance by 25 percent when looking at 1200 vs. 912 MHz maximum cpufreq) but make everything important accessible from userspace (no need to recompile the kernel)
David reacted to tkaiser in [RfC] Default settings for NanoPi NEO/Air August 17, 2016

To keep things short... (since I go on vacation and this most probably will be my last contribution to NEO stuff prior to release).

I did another consumption test: enabling all USB ports, audio and so on 'costs' 30mW --> already enabled. The autoboot and USB issues seem to be related to noisy DC-IN, at least no one else complained about this stuff but people already helped testing (see NEO thread in H3 forums)

Then the reason why I'm after lowering consumption is pretty simple: How to dimension PSUs when you power a fleet of devices? Imagine you use a PoE injector panel like this:

Maximum default consumption defines how you have to dimension your PSU and the individual step-down converters used (a real world use case not only I have to deal with). We're talking about three different types of consumption:
idle: irrelevant boot peak consumption: important, imagine you power on all 24 boards at the same time then 500mW more or less per board add to +12W peak consumption the PSU has to deal with (can not be controlled from userspace only by changing u-boot and kernel defaults!) 'worst case' consumption: scripts/daemons run amok for whatever reasons, taking sysbench and cpuburn-a7 numbers we get the idea how maximum PSU dimensions have to look like. Sysbench providing more realistic 'worst case' numbers and cpuburn-a7 the absolute maximum (can be controlled from userspace pretty easy through cpufrequtils config and with disabled corekeeper also from /etc/rc.local -- limiting count of active cpu cores) My goal was to get default maximum consumption (defined by both peak and 'worst case' consumption) as low as possible. With my proposed settings we can provide an OS image for NEO which guarantees that NEO itself does not exceed 2000 mW consumption.

The three most important changes:

Setting MAX_SPEED=912000 in /etc/default/cpufrequtils: this is responsible to remain at 1.1V VDD_CPUX and limits worst case consumption to around ~2W: I measured 1980 mW when running sysbench @ 912 MHz (vs. 1190 mW when being idle). When we use MAX_SPEED=1200000 instead then we're talking about ~5W (!), currently testing NEO with cpuburn, FA's heatsink and a fan and get 5105 mW, using sysbench it's 2940 mW instead. In reality this is also more or less a peak consumption value since no one will use the NEO with a fan and the budget_cooling stuff will limit consumption after a short period of time. But defaulting to 1200 MHz means that per board 5W instead of 2W maximum consumption have to be taken into account. Given the overheating issues of the board I think it's better to default to 912 MHz and tell users which 'price' they have to pay for unlocking another laughable 33 percent more CPU performance in /etc/default/cpufrequtils Changing interactive to userspace in kernel config's cpufreq governor settings. This switch is not really performance relevant since it only affects the few seconds between kernel starting and cpufrequtils taking over. The difference in numbers: With userspace as kernel default booting gets delayed by less than 1.3 seconds while reducing peak consumption compared to interactive by 500 mW! On the other H3 boards that are set to 816/1008 MHz in u-boot booting gets delayed by less than 0.5 seconds or even just 0.2x on all boards that are allowed to use the 1.3GHz clockspeed. So this little change only affects the few seconds between u-boot and start of cpufrequtils daemon but saves a whopping 500 mW peak consumption that has to be considered in PoE scenarios or if a few boards should be powered with a single PSU. And changing this parameter would require rebuilding the kernel so I strongly vote for changing the default there since it helps staying below the 2W barrier and costs only 1.3 seconds per boot. Lowering DRAM clockspeed from FA's 432 MHz to 408 MHz: no performance implications but saves 130 mW (630mW vs. 760mW) Issue 1) and 2) do really help in real world situations when it's about dimensioning amperage requirements of step-down converters and PSUs (or step-up converters when we're talking about battery usage!).

And I also consider the whole approach I've been busy with the last days as a differentiation feature of Armbian. We care about details. And with NEO and Air the use cases are pretty obvious (at least no media center where people prefer performance over consumption) and our settings should help users here. If anyone wants to get the laughable 33 percent speed improvement risking +2W more consumption and overheating, then he's free to adjust the single line in /etc/default/cpufrequtils

But please let us stay with the current settings and switch from interactive to userspace as kernel's default cpufreq governor (applies to other kernel branches as well but needs more testing regarding change of boot times so this is stuff for another time).

All my assumptions above are backed by the measurements that can be found in the latest posts of this thread.

BTW: I did yesterday worst case tests to check corekeeper behaviour (see post #123 in NEO thread). Trying to emulate an 'enclosure from hell' by putting NEO flat on a table with SoC/DRAM/U7 on the lower side w/o heatsink and only the table spreading a little bit of the heat NEO throttled down to 312 MHz, sometimes being at 240 MHz but CPU cores were never killed. So it's save to keep it enabled? Anyway: I burned my fingers later when touching Ethernet and SD card slot -- seems the NEO also uses copper layers since the whole board and every component was simply hot (another good reason to stay at 912MHz / 1.1V by default). Did the same 'test' later with Orange Pi Lite/PC and there SD card slot felt not that hot.

Now off for vacation...
David reacted to tkaiser in [RfC] Default settings for NanoPi NEO/Air August 17, 2016

Well, regarding the rush for images. For whatever reasons there is already one image available and NEO got an own download page (and 'test build' mentioned there is IMO too small)

Regarding minimum consumption I still think we should go this route with reasonable defaults since these boards might be powered through PoE or from battery (even if OPi One/Lite would be the better choices since at least OPi Lite is way more energy efficient most probably due to better components and maybe also due to single bank DRAM configuration on NEO being responsible for more heat/consumption?).

But IMO we should come up with settings that do not 'hurt'. Eg. lowering DRAM clock from vendor's default 432 MHz to 408 MHz: helps a lot with consumption while not affecting performance at all (to my big surprise). Or reducing maximum cpufreq by cpufrequtils to 912 MHz since this board is prone to overheating and avoiding 1.3V VDD_CPUX seems to be a good idea anyway. Same with switching from interactive to userspace default cpufreq governor in kernel. Avoids consumption peaks with NEO, doesn't hurt with all other H3 devices.

We could stay with 1200 MHz max cpufreq by default but why not making a clear disctinction through settings: since NEO is headless disabling HDMI/Mali already reduces consumption and use cases are also pretty limited (unlike with OPi One/Lite no one will try to use NEO as a media center). 'You want low power operation? Choose NEO or Air with Armbian, otherwise use the larger variants like NanoPI M1 or OPi PC'

BTW: All the results from this consumption monitoring approach will end up in documentation so owners of other H3 boards that want to lower consumption can also benefit (especially One/Lite are the champions in this area with improved settings). But I wanted to wait a bit since apart from the obvious (disable HDMI/Mali on a headless device, reduce CPU clockspeeds) the biggest win is reducing DRAM clock. This area needs still more testing and maybe someone looks into budget cooling stuff and adds dynamic DRAM clocking there too?

I tested with 3 different 'PSUs' the last days. With default settings (usb_detect_type = 0 / usb_host_init_state = 1) it's as follows:
'Monitoring PSU' (Banana Pro): dmesg flooded with ehci messages and also the 'hub 1-0:1.0: connect-debounce failed, port 1 disabled' messages appear every few seconds USB PSU I use since almost 2 years for various sunxi board tests: dmesg only reports a few ehci messages at system start [1] Using my MacBook Pro as power source: Not a single occurence of any ehci connect/disconnect message So it's definitely related to 'noise' (note to myself: get a JHT header and power Banana Pro not any longer through the Micro USB DC-IN jack but use the SATA power connector instead. Maybe voltage fluctuations are gone then or at least minimized) and I made a lot of wrong assumptions the last days blaming software/settings where the 'PSU' in question made the real difference. But this would also explain why we didn't receive complaints regarding NanoPi M1 (or BPi M2+) since my 'monitoring PSU' is obviously providing noisy power.

I then tested with the same USB PSU with usb_detect_type = 1 / usb_host_init_state = 1 (symptoms gone as expected) and also usb_detect_type = 0 / usb_host_init_state = 0 (symptoms also gone). But to be sure I went back to my 'Monitoring PSU' the Banana Pro. And guess what: autoboot problem immediately appeared again. So it seems noisy DC-IN provided by Banana Pro is responsible not only for these USB messages/errors but also for u-boot getting stuck at booting (at least good to know to interpret complaints in the future)

Connected to Banana Pro with usb_detect_type = 0 / usb_host_init_state = 0 I had not a single USB related error in the logs. Otg_role was set to 0 as expected, loading a gadget driver and changing role to 2 seems to work since then on Banana Pro lsusb also lists 'ID 0525:a4a7 Netchip Technology, Inc. Linux-USB Serial Gadget (CDC ACM mode)'. Then tested again with usb_detect_type = 0 / usb_host_init_state = 1
and again not a single error. Several reboots later (sometimes with UART adapter temporarely disconnected) I've not been able to reproduce the behaviour. Only 'being stuck in u-boot' happened nearly all the time when UART adapter was disconnected.

I need more time for tests to get a clue what's going on. What are the downsides of defining usb_host_init_state = 0 in fex file?

[1] Only a few ehci messages when using an USB PSU:

[ 15.201508] ehci_irq: highspeed device connect
[ 15.220124] ehci_irq: highspeed device disconnect
[ 15.361448] ehci_irq: highspeed device connect
[ 15.365806] ehci_irq: highspeed device disconnect
[ 15.441436] ehci_irq: highspeed device connect
[ 15.445869] ehci_irq: highspeed device disconnect
[ 15.521504] ehci_irq: highspeed device connect
[ 15.525870] ehci_irq: highspeed device disconnect
[ 15.581434] ehci_irq: highspeed device connect
[ 15.585792] ehci_irq: highspeed device disconnect
[ 15.641502] ehci_irq: highspeed device connect
[ 15.645867] ehci_irq: highspeed device disconnect
David reacted to tkaiser in SBC consumption/performance comparisons August 17, 2016

Since I realized that getting peak consumption values while booting is not a task that needs automation but can be done manually and in short time (count of samples can be rather low) I decided to give it a try. I let NanoPi NEO boot 10 times each with interactive, powersave and userspace governor (ondemand isn't interesting here) and simply watched my powermeter's display for peak numbers shown. At the end of the test I used my usual consumption monitoring setup to get an idea how the numbers the powermeter provided (including PSU's own consumption!) match with the usual numbers when using a Banana Pro as 'monitoring PSU'.
Results as follows: boot time peak consumption shown interactive: 10.6824 9 x 2.8W, 1 x 2.9W powersave: 14.7607 3 x 2.2W, 6 x 2.3W, 1 x 2.4W userspace: 11.9503 10 x 2.4W So based on this quick test the powersave governor doesn't help avoiding high consumption values since peak consumption values are pretty close to the results with userspace. On the other hand switching from interactive to powersave would increase boot time by ~4.1 seconds while userspace only delays boot times by ~1.3 seconds on the NEO. On all other H3 devices switching from interactive to userspace shouldn't matter at all since boot times are only slightly delayed -- see above (0.22 seconds more on OPi Plus 2E)
How would boot behaviour of H3 devices currently supported by Armbian change when switching default cpufreq governor in kernel config from interactive to userspace? Let's have a look at cpufreq scaling behaviour before the cpufrequtils daemon will be started (the short 2-3 seconds lasting consumption peaks happen prior to cpufrequtils start!). The numbers given for interactive are meant as 'spent most of the times at', when using userspace the board simply remains at the clockspeed set in u-boot config until cpufrequtils daemon will be started: interactive userspace NanoPi NEO 1008-1200 480 NanoPi M1, Banana Pi M2+ 1008-1200 816 Beelink X2, OPi One/Lite 1008-1200 1008 All other OPi 1008-1296 1008 That means that on NanoPi M1 and BPi M2+ booting might be delayed by ~0.5 seconds, with X2 or OPi Lite/One we're talking about 0.3 seconds and with the larger Oranges it's even less when switching to userspace. Consumption savings on all these boards are negligible but with NanoPi NEO we get a reduction of peak consumption while booting of approx. 500mW (not worth a look when powering a single NEO with a good PSU but if a fleet of NEOs should be powered through PoE then these 500 mW multiplied with the count of NEOs can make a huge difference regarding the PSU's amperage dimensions) The baseline of my tests was a NEO/512 powered through FriendlyARM's PSU-ONECOM with only Ethernet connected. Xunlong's 5V/3A PSU powered PSU-ONECOM and sits in a 'Brennenstuhl PM 231E' powermeter reporting ~1.6W idle consumption with userspace governor, all CPU cores active, default DRAM clockspeed --> means the board was idling at 480 MHz CPU clockspeed and 408 MHz DRAM clockspeed. When I ran 'sysbench --test=cpu --cpu-max-prime=2000000 run --num-threads=4' the powermeter showed 2.4W consumption at 912 MHz and 3.0W when at 1008 MHz (the huge increase is due to VDD_CPUX switching from 1.1V to 1.3V). So how do these values translate to consumption measurements 'behind PSU'? I used my usual Banana Pro monitoring PSU setup and got 1190 mW reported when idling (1.6W according to powermeter) 1980 mW when running sysbench at 912 MHz (2.4W according to powermeter) 2720 mW when running sysbench at 1008 MHz (3.0W according to powermeter) with PSU (powermeter) w/o PSU (Banana Pro) idle: 1.6W 1190 mW sysbench @ 912MHz: 2.4W (+800mW) 1980 mW (+790mW) sysbench @ 1008MHz: 3.0W (+1400mW) 2720 mW (+1530mW) TL;DR: Switching from interactive to userspace as default cpufreq governor in sun8i kernel config helps reducing NanoPi NEO's peak consumption at booting by ~500mW while it does not delay booting times a lot (~1.3 seconds longer on NEO). With this change situation for all other H3 devices does not change much both regarding peak consumption and boot times. Switching to userspace seems reasonable to me since we can benefit a lot with NEO's low power mode while not negatively affecting other boards.
David reacted to tkaiser in SBC consumption/performance comparisons August 17, 2016

Another round of tests. This time it's about lowering peak consumption. With our default settings we allow pretty low idle consumption but at boot time we always have rather high consumption peaks compared to idle behaviour later. In case someone wants to use a really weak PSU or powers a couple of boards with one step-down converter (via PoE -- Power over Ethernet -- for example) then it's important to be able to control consumption peaks also.
With most if not all board/kernel combinations we have three places to control this behaviour: u-boot: brings up the CPU cores, defines initial CPU and DRAM clockspeed kernel defaults: as soon as the kernel takes over these settings are active (might rely on u-boot's settings, might use own settings, minimum/maximum depends on device tree or fex stuff on Allwinner legacy kernels userspace: In Armbian we ship with cpufrequtils that control minimum/maximum cpufreq settings and governor used -- have a look at /etc/default/cpufrequtils) So how to get for example a NanoPi NEO to boot with as less peak consumption possible with legacy kernel? With our most recent NEO settings we bring up all 4 cores and define CPU clockspeed in u-boot as low as 480 MHz. As soon as the kernel takes over we use interactive governor and allow cpufreq scaling from 240 MHz up to 1200 MHz and since booting is pretty CPU intensive the kernel will stay at 1008 MHz or above most of the time while booting being responsible for consumption peaks that exceed idle consumption by 4-5 times. As soon as the cpufrequtils takes over behaviour can be controlled again (eg. setting MAX_SPEED=1296000 to just 240 MHz) So the problem is the time between kernel starting and invokation of cpufrequtils daemon since our default 'interactive' cpufreq governor lets run H3 on all 4 cores with 1200 MHz on NEO even if we defined maximum cpufreq in normal operation mode to be 912 MHz (everything defined in /etc/default/cpufrequtils will only be active when cpufrequtils has been started by systemd/upstart) Since we can choose between a few different cpufreq governors with H3's legacy kernel I thought: Let's try the differences out (leaving out the performance governor since this one does the opposite of what we're looking for). I modified cpufrequtils startscript to do some monitoring (time of invocation and cpufreq steps the kernel used before) and added a script to log start times in a file to create average values later, then reboot the board automatically and to exchange the kernel after every 100 reboots with 4 different settings: interactive, ondemand, powersave and userspace default cpufreq governor. To get an idea how changing the default cpufreq governor in kernel config might influence other H3 boards I chose to strongest one to compare: OPi Plus 2E. NanoPi NEO will be configured to use 480 MHz cpufreq set by u-boot and to allow cpufreq scaling between 240 MHz and 1200 MHz. OPi Plus 2E uses 1008 MHz as cpufreq in u-boot and jumps between 480 MHz and 1296 MHz with our default settings. So how do the 4 different cpufreq governors behave with both boards? interactive: does the best job from a performance perspective since this governor switches pretty fast from lower clockspeeds to higher ones (also highest consumption peaks seen) ondemand: In our tests cpufreq only switched between lowest allowed and highest clockspeed while remaining most of the times at the lowest (240/1200 on NEO and 480/1296 on Plus 2E). Please be aware that ondemand is considered broken. powersave: With this setting cpufreq remains at the lowest allowed clockspeed (240 MHz on NEO and 480 MHz on Plus 2E) userspace: No adjustments at all, simply re-using the clockspeed set by u-boot (480 MHz on NEO and 1008 MHz on Plus 2E) Let's have a look how boot times changed. I simply monitored the time in seconds between start of the kernel and invocation of cpufrequtils (since this is the time span when changing the default cpufreq governor in kernel config matters): NEO Plus 2E Interactive: 10.06 9.93 Ondemand: 12.43 10.90 Powersave: 14.16 11.51 Userspace: 11.36 10.15 Shorter times correlate with higher peak consumption. So it's obvious that changing default cpufreq governor for H3 boards from interactive to powersave would help a lot reducing boot consumption. On the NEO this will delay booting by ~4.1 seconds and on Plus 2E by just ~1.65 seconds -- reason is simple: NEO boots with 240 MHz instead of remaining above 1008 MHz most of the time and OPi Plus 2E boots with 480 MHz instead of +1200). But userspace is also interesting. This governor doesn't alter cpufreq set by u-boot so therefore NEO boots with 480 MHz and OPi Plus 2E with 1008 MHz (also true for all other H3 devices except of the overheating ones -- Beelink X2, Banana Pi M2+ and NanoPi M1 use 816 MHz instead) while delaying boot times just by 1.3 seconds (NEO) or 0.23 (Plus 2E). The 'less consumption' champion is clearly powersave but since we want to only maintain one single kernel config for all H3 boards it might be the better idea to choose userspace instead as default cpufreq governor in sun8i legacy kernel config since with this setting NEO still reduces boot consumption a lot but other H3 devices aren't affected that much. All consumption numbers are just 'looking at powermeter while board boots'. My measurement setup using average values totally fails when it's about peak consumption. I already thought about using a RPi 3, its camera module, the motion daemon and an OCR solution to monitor my powermeter. But based on the information we already have (consumption numbers based on cpufreq/dvfs settings) it seems switching from interactive to userspace is a good idea to save peak current while booting. Though if anyone is after lowest consumption possible then choosing powersave is the better choice. In case someone wants to test on his own here's the procedure and test logs:
David reacted to tkaiser in New Oranges with H5 and H2+ August 17, 2016

Allwinner redesigned their website. H64 is gone, H5 has 3 real USB host ports (and a 6 core Mali450 GPU and other display/video improvements compared to A64) and they finally announced R40 (quad-core A7, 2 x USB host + SATA, GbE). R40 is said to be the 'A20 Upgrade Edition: Dual-core upgrade to Quad-core(CPU), 55nm upgrade to 40nm(Craft), lower power consumption, smaller package.' -- ok, time to stop wasting efforts with H3 and H5, let's wait for R40 hardware to appear

BTW: For R40 is also mentioned: 'Open Sources: Supports our own lightweight Linux OS called Tina, which specialized designed for smart hardware' -- well, let's see how Allwinner's definition of Open Source will look like

Edit: First device already appeared: Banana Pi BPi-M2 Ultra. Well, 'Team BPi' continues with their 'copy&paste gone wrong' crap (according to their 'documentation' the board has either 3 USB host ports -- which would mean they use an internal USB hub -- or just 2 which would be the count of host ports R40 features) and chose a pretty misleading name: BPi M2 was based on A31s, BPi M2+ on H3 and 'M2 Ultra' has nothing in common with both since it's in a direct line with the original A20 based Bananas. Also M2 should indicate that M3 is a better choice (which it's clearly not, worst software/support ever, hardware design flaws, not possible to get performance by specs in reality)

Sign In

David

Posts

Joined

Last visited

Reputation Activity

Forums

My Activity Streams

Download

Store

Important Information