Flooding threads with BS again and again. Why? Are you able to click on links and compare specs? Do you spot the difference between 1GB DDR3L and 2GB LPDDR3? Dimensions of an enclosure? 1.3MP vs. 0.3MP?
TL Lim is in constant contact with linux-sunxi guys and @Xaliusfor example managed just recently to convince him to reroute PCB stuff that makes debugging more easy.
Well, MPI is an implementation (needing GbE network tuning in another direction compared to the usual moronic iperf tests: MPI needs low latency!) and not a specific benchmark, but you're right. This is one of the few use cases where Pine64+ could shine when used with optimized software. Compare the Linpack results above or those from cluster setups where unoptimized software has been used.
A yet not published 'Cluster Deathmatch: Raspberry Pi3 vs NanoPC-T3 vs Orange Pi Plus2E vs PINE A64+' article for example shows really weird numbers if you compare what happens when you use the correct compiler switches (and HPC is only about this since you get results magnitudes better with same consumption figures, and especially performance per watt ratio increases a lot)
Debian Jessie hpcc NEON/hpcc
Pine64+ 7731 MFLOPS 15000 MFLOPS
Plus 2E - MFLOPS 8650 MFLOPS
NanoPC 27740 MFLOPS 62500 MFLOPS
RPi 3 3402 MFLOPS 18000 MFLOPS
So if we take the 15 GFLOPS of a 5 node Pine64+ cluster into account it gets interesting. But on the other hand the result listed there for NanoPC-T3 already indicates that NanoPi M3 using the very same SoC might be the better choice. It's GbE capable, has twice as much Cortex-A53 cores but suffers from missing ARMv8 optimized distro/kernel just like RPi 3. But using the correct compiler switches and ARMv8/NEON optimizations maybe the 27.6 GFLOPS above from Jessie's hpcc (ARMv7!) binary will be twice as much in reality (M3 still costs less than $40 if you add the necessary heatsink).
Update: I tested on the M3 as can be seen here. With my setup only using a laughable 5V/2A PSU connected to the power pins and a rather inefficient fan I'm not able to fully unlock the performance potential of this board. When using a more beefy PSU and better cooling a SBC cluster made out of 5 NanoPi M3 should get a total 62 GFLOPS score using an (ARMv7) optimized Linpack version (confirmed in the meantime). Therefore adding all costs and comparing 15 GFLOPS for a 5 node Pine64+ cluster with the 62 GFLOPS NanoPi M3 is able to achieve it's easy to decide (against Pine64+ )
Anyway Pine64+ using non-standard USB-type-A-to-type-A cables can be booted through FEL or from cheap SPI flash modules so building a cluster out of a few Pine64+ might not even require adding SD-cards to the setup which will further reduce costs. And if one knows how to test individual clockspeed reliability it can be overclocked up to 1296 MHz without issues (but requires large heatsinks and a lot of airflow!).
Armbian since it's not a distro but a convenient build system would be a perfect basis for such experiments since a few tweaks are enough to produce custom OS images that contain all the necessary cluster stuff already. See customization and NFS boot options.