This is something hopefully suitable to become a 'Board Bring up' thread.
The NanoPC T4 was the smallest RK3399 board around featuring full set of interfaces (Rock960 was smaller but there you can't use GbE without a proprietary expander) but in the meantime he got two smaller siblings: NanoPi M4 and the cute NEO4.
Another RK3399 board so software support is already pretty mature
Rich set of interfaces (2 x USB2 without shared bandwidth, 2 x USB3, triple display output and so on)
No powering hassles due to 12V (2A) PSU requirement
16GB superfast eMMC 5.1
Usable and performant Wi-Fi (dual band and dual antenna so MIMO can be used, for numbers see here)
All 4 PCIe lanes exposed (M.2 M key connector on the bottom, suitable for NVMe SSDs, or to attach a 4 port SATA controller or a PCIe riser card)
A bit pricey (but if you compare with RockPro64 for example and order all Add-Ons you end up with a similar price)
High idle consumption (4W PSU included in idle), maybe this is just bad settings we can improve over time
heatsink too small for continous loads
I started relying on @hjc's work since he's currently using different kernels than we use on RockPro64 or ODROID-N1 (though all the 4.4 kernels are more or less just RK's 4.4 LTS branch with some modifications, with mainline I didn't had a look what's different in Heiko's tree and 'true' mainline).
Tinymembench numbers with RK 4.4 vs. mainline kernel (the latter both showing lower latency and higher bandwidth).
Internal 16 GB eMMC performance:
eMMC / ext4 / iozone random random
kB reclen write rewrite read reread read write
102400 4 23400 28554 26356 26143 27061 29546
102400 16 48364 48810 85421 85847 84017 47607
102400 512 48789 49075 273380 275699 258495 47858
102400 1024 48939 49053 290198 291462 270699 48099
102400 16384 48673 49050 295690 295705 294706 48966
1024000 16384 49243 49238 298010 298443 299018 49255
That's what's to be expected with 16 GB and exactly same numbers as I generated on ODROID-N1 with 16 GB size. When checking SD card performance it maxed out at 23.5 MB/s which is an indication that no higher speed modes are enabled (and according to schematics not possible since not able to switch to 1.8V here -- I didn't try to adjust DT like with ODROID-N1 where SDR104 mode is possible which led to some nice speed improvements when using a fast card -- see here and there)
Quick USB3 performance test via the USB-3A port:
Rockchip 4.4.132 random random
kB reclen write rewrite read reread read write
102400 4 24818 29815 33896 34016 24308 28656
102400 16 79104 90640 107607 108892 80643 89896
102400 512 286583 288045 285021 293431 285016 298604
102400 1024 315033 322207 320545 330923 320888 327650
102400 16384 358314 353818 371869 384292 383404 354743
1024000 16384 378748 381709 383865 384704 384113 381574
mmind 4.17.0-rc6-firefly random random
kB reclen write rewrite read reread read write
102400 4 37532 42871 22224 21533 21483 39841
102400 16 86016 104508 87895 87253 84424 102194
102400 512 274257 294262 287394 296589 287757 304003
102400 1024 294051 312527 317703 323938 323353 325371
102400 16384 296354 340272 336480 352221 339591 340985
1024000 16384 367949 189404 328094 330342 328136 139675
This was with an ASM1153 enclosure which shows slightly lower numbers than my usual JMS567 (all currently busy with other stuff). Performance with RK 4.4 kernel as expected, with mainline lower for whatever reasons. I also tried to test with my VIA VL716 enclosure directly attached to the USB-C port but ran into similar issues as with RockPro64 but since my enclosure and the cable also show problems when using at a MacBook Pro I suspect I should blame the hardware here and not USB-C PHY problems with RK3399.
This is NanoPC T4 with vendor's heatsink, lying flat on a surface that allows for some airflow below, running cpuburn-a53 on all 6 cores after half an hour:
13:57:31: 1008/1416MHz 8.44 100% 0% 99% 0% 0% 0% 91.1°C 0/5
13:57:40: 1008/1416MHz 8.52 100% 0% 99% 0% 0% 0% 91.1°C 0/5
13:57:48: 1008/1416MHz 8.51 100% 0% 99% 0% 0% 0% 91.1°C 0/5
13:57:57: 1200/1416MHz 8.47 100% 0% 99% 0% 0% 0% 90.6°C 0/5
13:58:05: 1200/1416MHz 8.47 100% 0% 99% 0% 0% 0% 91.1°C 0/5
So with heavy workloads you most probably need a fan to prevent throttling.
Development related questions: IMO we should try to rely on single sources for all the various RK3399 boards that are now available or will be soon. And I would prefer ayufan's since he's somewhat in contact with RK guys and there's a lot of great information/feeback provided by TL Lim. What do others think?
Also an issue is IRQ affinity since on boards where PCIe is in use those interrupts should clearly end up on one of the big cores while on other boards USB3 and network IRQs are better candidates. I already talked about this with @Xalius ages ago and most probably the best idea is to switch from static IRQ affinity set at boot by armbian-hardware-optimization to a daemon that analyzes IRQ situation every minute and adopts then dynamically the best strategy.
Wrt information for endusers. All RK3399 boards basically behave the same since the relevant stuff is inside the SoC. There's only different DRAM (matters with regard to consumption and performance), different interfaces exposed and different power circuitry (and obviously different settings like e.g. cpufreq behaviour but I think we should consolidate those for all RK3399 boards). So you already find a lot of information in my ODROID-N1 'review', my SBC storage performance overview and most probably also a lot around RockPro64. No idea where to inform about RK3399 GPU/VPU stuff since not interested in these areas at all (hope others add references or direct information).
Well, let's take a break from important decision-making, and post some benchmarks about VPU/GPU. I have used the official FriendlyELEC's Ubuntu Xenial image, kernel 4.4.126, 32-bit architecture. That allowed me to easily create a media configuration script, based on the existing RK3288 one. All tests were done with "performance" governor, both in CPU and GPU.
The 32-bit script can be downloaded here. It is not yet a release version, so there may be some rough edges. If the community demands it, I might create a 64-bit version in the future.
1. VPU 4K Video Decoding capabilities
I'll make a chart comparing NanoPC-T4 (RK3399) with ASUS TinkerBoard (RK3328) and Khadas VIM2 (Amlogic S912). Rockchips acceleration was tested with our well-known MPV, while for Amlogic we used @balbes150's LibreElec:
BOARD H.264 HEVC VP9 HEVC-10 VP9-10
TB (RK3288) ✓ ✓ x x x
T4 (RK3399) ✓ ✓ ✓ ✓ x
VIM2 (S912) ✓ ✓ ✓ ✓ ✓
The first thing to comment here is that VP9 10-bit HDR video playback is not supported by the board, which on the other hand is in accordance with Rockchip's official specs. HDR is only supported for H.264 and H.265 (we didn't test the former, though, but we can assume it works if the latter does). It should not be a software issue, since we tested also in Android with the Rockchip Media Player App, and we only got a messagebox saying "10-bit video is not supported". So, in this aspect, RK3399 is inferior to the competitor's S912, which can play VP9-HDR10 videos.
Besides that, all supported videos played with perfect smoothness and vivid colors.
2. GPU OpenGL-ES & WebGL
Time for 3D capabilities of the Mali-T860 with 4 cores @800 Mhz. As references for the comparison, we chose this time the TinkerBoard (Mali-T760, 4 cores@600), and the Odroid XU4 (Mali-T628, 6 cores@600). All three malis share the Midgard architecture, though they represent the first, second and third generation. We didn't use this time the S912, because it has no Linux Mali support, plus performance would be much lower with only 3 cores. Kernel for TB is Armbian's 4.4.131-rockchip, while for XU4 is 4.14.30. For the Rockchips we used their custom X server with Glamor enabled, while for XU4 we used Crashoverride's armsoc X driver:
Results are in frames per second:
BOARD Glmark2-X Glmark2-offscreen WebGL Aquarium (Chromium)
TB (RK3288) 57 547 30-34
T4 (RK3399) 52 340 33-36
XU4 (S5422) 428 747 39-44
In the first place, Rockchip's X server seems to be better tuned for the RK3288, where we see that FPS almost match the Vsync (60 FPS). In the case of RK3399, we see it is significantly lower, while we also see tearing that does not happen with the other Rockchip SoC. XU4's driver works in a different way, and so it is not limited by Vsync, but nevertheless it doesn't show any tearing.
Offscreen performance is strangely low in RK3399 (probably a matter of a not-so-well tuned Mali binary?). WebGL performs better in 3399 than 3288, but the difference is not as big as one should expect considering the superiority of both CPU and GPU. XU4 clearly sticks from the others.
3. GPU OpenCL performance
Now OpenCL performance with GPU miners. That can give us a better idea of the raw processing power of the GPU since there are no X drivers getting in the way. We used cgminer for the skein algo, and sgminer for lyra2rev2. Results are averages from running the miner for two hours, with an intensity as high as possible monitoring that no hardware errors happened.
BOARD Skein (Mh/s) Lyra2rev2 (Kh/s)
TB (RK3288) 1.150 42
T4 (RK3399) 1.150 60
XU4 (S5422) 1.440 72
It is shocking that performance in Skein algo is about the same for RK3288 and RK3399, but using CPU miners for the same algorithm the TinkerBoard also gives surprisingly high numbers (higher than XU4). For that reason, I think there must be something in the TB that makes it specially suited for Skein (maybe the dual-channel RAM?), so results must be taken carefully. On the other hand, Lyra2 numbers are more or less as expected.
Remember that all these tests are made in a armhf system. Maybe arm64 can give us some surprise in the future. Or maybe Rockchip can make improvements to their binaries/libraries giving better performance.