5 Node Cluster of Orange Pi Plus 2Es

sheffield_nick · July 7, 2016

I've just written an article about my latest cluster build - a 5-node cluster using the Orange Pi Plus2E single board computers - each with 4-core ARM A7 running at 1.3GHz

http://climbers.net/sbc/orange-pi-plus-2e-cluster/

Please share the link on Facebook/Twitter/etc if you find it interesting. Thanks

tkaiser · July 7, 2016

Thanks for the insights. A few things to mention:

GPIO pins 2/4 (5V) and 6 (GND) are directly routed to DC-IN and GND from the barrel jack so you can both power the board through these GPIO pins (no drawbacks unlike RPi where you have less protection when you power through GPIO pins instead of crappy Micro USB) and also connect consumers there without caring too much
The 1296 MHz maximum we use are sane defaults. Same applies to our throttling settings. In case you want more performance, don't mind using active cooling (even with a slow/silent large fan) and know what you're doing you can exceed this. To be able to use higher clockspeeds you have to test them for reliability (steps outlined here) and if you found working dvfs OPPs then you can think about adjusting throttling settings (our defaults start to throttle at 75Â°C but in a controlled cluster setup if you know your workloads you might want to increase the trip points)
The approach to use 1536 MHz at 1.5V is as brain-dead as it was in the beginning when 3rd parties started to use these settings on Orange Pi's -- key to success is testing individually every H3 board for its limits and use these. But you have to keep in mind that the available cpufreq steps above 1296 MHz are both limited and hardcoded in kernel sources and in case you want to use eg. 1392 MHz you would have to patch the kernel -- check /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state for the cpufreqs available)
Information regarding network/USB on RPi is somewhat wrong. The RPi SoC has just one single USB 2.0 connection to the outside. All available USB ports as well as the Fast Ethernet adapter are behind and internal USB hub and have to share bandwidth. So it's neither "4 USB ports" (well, the same as 1 USB port with a 5 port hub connected) nor true Fast Ethernet since this is just an USB-Ethernet adapter hanging off the single USB 2.0 connection (behind the hub!). So for anything that needs bandwidth every RPi regardless of the ARM cores they put into the SoC always horribly sucks.

On a related note: interesting insights regarding your NanoPC-T3/Nexell cluster (especially the thermal stuff). Can you please provide how long the 'classic' (and potentially inappropriate) sysbench run with the following settings takes on the octa-core NanoPC? Please provide the info in this thread: http://forum.armbian.com/index.php/topic/1285-nanopi-m3-cheap-8-core-35/?view=getlastpost

sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8

sheffield_nick · July 7, 2016

Many thanks for the useful feedback I've just updated my article with a note about powering things from the GPIO pins.

I'll run that test on my NanoPC-T3s and post my results.

tkaiser · July 7, 2016

I've just updated my article with a note about powering things from the GPIO pins.

IMO it would be good to correct the USB/network information too. One of the strengths of the cheap H3 SoC is 3 real USB host ports + 1 USB OTG port (that can be used as host port -- Armbian default!) + dedicated networking. In case of OPi Plus 2E that's Gbit Ethernet using a dedicated RGMII connection. And H3 is powerful enough to saturate all 4 USB 2.0 ports and the GbE connection in parallel (depends on kernel version) while any RPi is always limited to one single USB 2.0 connection.

There are other SBC designs that show similar behaviour (eg. the horrible Banana Pi M3 where the 'engineers' forgot to use one of the 2 USB host ports and connect both USB type A receptacles and the slow USB-to-SATA bridge to one internal USB hub that only uses one USB host port of the SoC -- but at least there GBit Ethernet is also using a dedicated RGMII connection) but RPi's design is most probably the worst possible if it's about bandwidth (regardless whether we're talking about IO or networking)

sheffield_nick · July 7, 2016

Thanks - I'm writing a separate article benchmarking the different boards/clusters, so I'll definitely mention those details about shared USB/ethernet vs. separate. And USB-to-SATA

Ford Prefect · July 7, 2016

Nice work.

Although I wonder if having all the PI's in the same box is really advantageous.

sheffield_nick · July 8, 2016

In case of OPi Plus 2E that's Gbit Ethernet using a dedicated RGMII connection. And H3 is powerful enough to saturate all 4 USB 2.0 ports and the GbE connection in parallel (depends on kernel version) while any RPi is always limited to one single USB 2.0 connection.

Any idea how I could benchmark that? Ideally without needing 4 external USB hard drives... or 4 host-to-host USB cables and connecting to other computers.

Thanks

tkaiser · July 10, 2016

Any idea how I could benchmark that? Ideally without needing 4 external USB hard drives... or 4 host-to-host USB cables and connecting to other computers.

There's no need to benchmark this. I tested with mainline kernel and 3 external USB disks half a year ago (back then no SMP was working so H3 was running on a single CPU core at 1008 MHz) and could saturate all three host ports. Performance of the OTG port when used as host port is also known, so when H3 is running on all 4 cores it's no problem to exceed 150 MB/s while using all 4 USB ports in parallel. Network performance is also already known and since H3 is powerful enough it's also not a problem to saturate USB lines and GbE in parallel. You find some numbers in this thread: http://forum.armbian.com/index.php/topic/1440-h3-devices-as-nas/

Performance might vary depending on kernel version (GbE with mainline kernel is currently slower compared to legacy kernel, regarding USB storage it's the other way around) but the important thing here is: 4 USB ports and 1 GbE network port that do neither have to share bandwidth nor block each other when used in parallel. And this is the most important differentiation to any RPi where everything is limited by the single USB 2.0 connection. Any data that has to be transferred between a disk connected to RPi (USB) and a client behind the network (also USB) has to pass this bottleneck twice. It's really not comparable and RPi suffers a lot when it's about transferring data.

But to be honest: Since we're talking about clustering it's all about the use case. Do cluster nodes need high network bandwidth? Then forget about any RPi and choose SoCs with true GbE networking. Do they need high local storage throughput? If so forget about any RPi and maybe also about any non-SATA SoC. And so on.

Clustering with cheap ARM boards looks nice and might be a useful tinkering excercise doing some things right (eg. an intelligent cooling approach instead of the brain-dead ones mostly seen). But there are only a few reasonable use cases for such stuff (eg. an automated build/test farm for ARM installations since testing stuff natively might be 20 times faster than using QEMU) and especially when people start thinking about HPC stuff then choosing ARM boards that aren't OpenCL capable and can compute stuff on powerful GPU cores is laughable (since you would need a few hundred boards, an insane amount of networking equipment and cabling and the whole cluster would still be outperformed by any middle class GPGPU capable cheap PC)