1 1
hjc

NanoPi M4 performance and consumption review

Recommended Posts

I've been doing some tests with NanoPi M4 these days. While I'm not a professional board reviewer, here I can share some early performance numbers to you. Beware that none of these tests fit into real world use cases, they are just provided as-is. Besides, Armbian development on RK3399 boards are still at a very early stage, so any of these numbers may change in the future, due to software changes.

Unless mentioned, all tests are done using Armbian nightly image, FriendlyARM 4.4 kernel, CPU clocked at 2.0/1.5GHz

 

Powering

NanoPi M4 is my first board powered by USB-C, while RK3399 is not power-hungry under normal load, I do doubt if 5V/3A power supply is sufficient when the CPU load goes higher, or when a lot of USB devices are connected. So I went a series of power measurement, with this tool

power.thumb.jpg.afe143b4d99cc41711c2cce697555b24.jpg

That is to measure the power consumption on the USB side, excluding the consumption of PSU.

The board is powered by the USB-C charger that came with my Huawei MateBook E, which supports 5V/2A, 9V/2A, and 12V/2A, so theoretically it is insufficient to power the NanoPi M4 board. Unfortunately I can't find a USB-C charger capable of 5V/3A output, and I have to do such test with it.

What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed. But normally the board's consumption will not exceed 10W, so the charger works just fine.

 

Test setup

1) Idle consumption

This is the typical consumption when you use it as an headless server.

 

2) Idle consumption with HDMI display output (console tty interface, no Desktop/X11/GPU stuff)

Testing with Dell P2415Q 4k 60Hz display. HDMI connected, with 2560*1440 60Hz video output. Also connect the USB 3.0 hub to

 

3) Display connected, 802.11ac WiFi with iperf sending

With HDMI display connected (same as (2)), and WiFi connected to 802.11ac 5GHz AP in another room, run the following command:

iperf3 -c 10.24.0.1 -t 60

The WiFi throughput is around 110Mbps

 

4) Display connected, running cpuburn

With HDMI display connected (same as (2)), run cpuburn on all 6 cores

 

5) Idle consumption of 4.19-rc1 mainline kernel

Same as (1), but running mainline kernel.

 

Test results

image.png.ee782a35ffa1c9015ba52437413bf81f.png

 

The idle consumption is 1.79W, and it might need some tuning to reduce the consumption. When WiFi and display are connected, it goes higher to 2.87W.

With an active WiFi networking, the board consumes 4.67W, and with all CPU cores active, it consumes 9.86W.

Mainline kernel has a higher idle consumption, the reason might be DDR dvfs and/or devfreq are not implemented yet.

Based on these results, it seems that 5V/2A power is okay if no peripheral devices are connected. However if you connect any USB devices, it may easily exceed the 2A limit when CPU load goes higher.

 

CPU/RAM and IO Performance

While RK3399 is not a super fast chip, its performance fits into its position. To reveal the full potential of the board, I'm posting some visualized sbc-bench results taken from mainline 4.19-rc1 kernel here. This is because there might be some DRAM performance issues on RK3399 with 4.4 kernel.. For comparison, I'm also posting the results of Firefly-RK3399 (2.2/1.8GHz overclock, tested by myself), Raspberry Pi 3 B+, ROCK64 and RockPro64 (taken from existing sbc-bench results)

 

You can see the full sbc-bench log here.

 

Memory

image.png.a6634cc186547423ce4c99e6eb6f52d8.png

 

7-zip

image.png.584ce2f025d7c5d8b95b58b9ea31db30.png

 

cpuminer

image.png.b3004bf8b512e2fbccf5f57125022697.png

 

For IO performance, I use iozone to measure the performance of SD card, eMMC and USB SSD. NanoPC T4's NVMe SSD results are added as a reference.

SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size.

image.png.a592212a7c3909291407784586c0f81e.png

 

Networking

NanoPi M4 comes with a 1Gbps ethernet port and a 802.11ac 2x2 MIMO WiFi module, and I tested both with iperf3.

GbE iperf3 full duplex test:

hjc@nanopim4:~$ iperf3 -c 10.20.0.1 & iperf3 -Rc 10.20.0.1 -p 5202
[1] 27486
Connecting to host 10.20.0.1, port 5201
Connecting to host 10.20.0.1, port 5202
Reverse mode, remote host 10.20.0.1 is sending
[  4] local 10.20.0.2 port 43782 connected to 10.20.0.1 port 5201
[  4] local 10.20.0.2 port 45102 connected to 10.20.0.1 port 5202
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  64.6 MBytes   542 Mbits/sec                  
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  95.1 MBytes   798 Mbits/sec    0    314 KBytes       
[  4]   1.00-2.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   1.00-2.00   sec  94.5 MBytes   793 Mbits/sec    0    320 KBytes       
[  4]   2.00-3.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   2.00-3.00   sec  95.8 MBytes   803 Mbits/sec    0    317 KBytes       
[  4]   3.00-4.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   3.00-4.00   sec  94.5 MBytes   792 Mbits/sec    0    317 KBytes       
[  4]   4.00-5.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   4.00-5.00   sec  94.6 MBytes   794 Mbits/sec    0    314 KBytes       
[  4]   5.00-6.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   5.00-6.00   sec  95.7 MBytes   803 Mbits/sec    0    314 KBytes       
[  4]   6.00-7.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   6.00-7.00   sec  95.5 MBytes   801 Mbits/sec    0    317 KBytes       
[  4]   7.00-8.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   7.00-8.00   sec  94.8 MBytes   795 Mbits/sec    0    314 KBytes       
[  4]   8.00-9.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   8.00-9.00   sec  94.5 MBytes   792 Mbits/sec    0    314 KBytes       
[  4]   9.00-10.00  sec  97.2 MBytes   816 Mbits/sec    0    320 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   952 MBytes   799 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   949 MBytes   796 Mbits/sec                  receiver
[  4]   9.00-10.00  sec   110 MBytes   921 Mbits/sec                  

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
iperf Done.
[  4]   0.00-10.00  sec  1.03 GBytes   884 Mbits/sec    9             sender
[  4]   0.00-10.00  sec  1.03 GBytes   882 Mbits/sec                  receiver

iperf Done.
[1]  + 27486 done       iperf3 -c 10.20.0.1

 

Wireless

hjc@nanopim4:~$ iperf3 -c 10.24.0.1
Connecting to host 10.24.0.1, port 5201
[  4] local 10.23.4.116 port 39730 connected to 10.24.0.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  13.0 MBytes   109 Mbits/sec   13   1.21 MBytes       
[  4]   1.00-2.01   sec  12.9 MBytes   107 Mbits/sec    5    618 KBytes       
[  4]   2.01-3.00   sec  12.6 MBytes   106 Mbits/sec    0    618 KBytes       
[  4]   3.00-4.00   sec  9.35 MBytes  78.7 Mbits/sec    4    329 KBytes       
[  4]   4.00-5.00   sec  11.1 MBytes  92.9 Mbits/sec    0    348 KBytes       
[  4]   5.00-6.00   sec  10.2 MBytes  85.5 Mbits/sec    0    363 KBytes       
[  4]   6.00-7.00   sec  9.37 MBytes  78.6 Mbits/sec    0    387 KBytes       
[  4]   7.00-8.00   sec  10.9 MBytes  91.5 Mbits/sec    0    409 KBytes       
[  4]   8.00-9.00   sec  13.6 MBytes   114 Mbits/sec    0    409 KBytes       
[  4]   9.00-10.00  sec  13.8 MBytes   116 Mbits/sec    0    410 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   117 MBytes  98.0 Mbits/sec   22             sender
[  4]   0.00-10.00  sec   116 MBytes  97.0 Mbits/sec                  receiver

iperf Done.
hjc@nanopim4:~$ iperf3 -c 10.24.0.1 -R
Connecting to host 10.24.0.1, port 5201
Reverse mode, remote host 10.24.0.1 is sending
[  4] local 10.23.4.116 port 39734 connected to 10.24.0.1 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  10.6 MBytes  88.8 Mbits/sec                  
[  4]   1.00-2.00   sec  10.9 MBytes  91.5 Mbits/sec                  
[  4]   2.00-3.00   sec  4.41 MBytes  37.0 Mbits/sec                  
[  4]   3.00-4.00   sec  2.07 MBytes  17.3 Mbits/sec                  
[  4]   4.00-5.00   sec  1018 KBytes  8.34 Mbits/sec                  
[  4]   5.00-6.00   sec  1.29 MBytes  10.8 Mbits/sec                  
[  4]   6.00-7.00   sec  6.48 MBytes  54.4 Mbits/sec                  
[  4]   7.00-8.00   sec  10.8 MBytes  91.0 Mbits/sec                  
[  4]   8.00-9.00   sec  10.7 MBytes  89.9 Mbits/sec                  
[  4]   9.00-10.00  sec  10.7 MBytes  89.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  70.1 MBytes  58.8 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  69.1 MBytes  58.0 Mbits/sec                  receiver

iperf Done.

It's too complicated to analyze the performance of a WiFi connection, but so far I've never seen more than 200Mbps throughput on AP6356S.

Share this post


Link to post
Share on other sites
22 hours ago, hjc said:

What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed

 

Thank you for the detailed test and especially covering also the underpowering situation (a lot of users might run into). I don't know whether it was available from the beginning but now FriendlyELEC lists as options in their web shop:

  • 5V 4A Power Adapter (+$8.99)
  • German Plug Adapter( applies to: France,Germany,Portugal,Spain,Korea ) (+$5.99)

The PSU as well as the heatsink seem like mandatory accessories to me.

Share this post


Link to post
Share on other sites
On 9/2/2018 at 12:34 AM, hjc said:

I've been doing some tests with NanoPi M4 these days. While I'm not a professional board reviewer, here I can share some early performance numbers to you. Beware that none of these tests fit into real world use cases, they are just provided as-is. Besides, Armbian development on RK3399 boards are still at a very early stage, so any of these numbers may change in the future, due to software changes.

Unless mentioned, all tests are done using Armbian nightly image, FriendlyARM 4.4 kernel, CPU clocked at 2.0/1.5GHz

 

Powering

NanoPi M4 is my first board powered by USB-C, while RK3399 is not power-hungry under normal load, I do doubt if 5V/3A power supply is sufficient when the CPU load goes higher, or when a lot of USB devices are connected. So I went a series of power measurement, with this tool

power.thumb.jpg.afe143b4d99cc41711c2cce697555b24.jpg

That is to measure the power consumption on the USB side, excluding the consumption of PSU.

The board is powered by the USB-C charger that came with my Huawei MateBook E, which supports 5V/2A, 9V/2A, and 12V/2A, so theoretically it is insufficient to power the NanoPi M4 board. Unfortunately I can't find a USB-C charger capable of 5V/3A output, and I have to do such test with it.

What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed. But normally the board's consumption will not exceed 10W, so the charger works just fine.

 

Test setup

1) Idle consumption

This is the typical consumption when you use it as an headless server.

 

2) Idle consumption with HDMI display output (console tty interface, no Desktop/X11/GPU stuff)

Testing with Dell P2415Q 4k 60Hz display. HDMI connected, with 2560*1440 60Hz video output. Also connect the USB 3.0 hub to

 

3) Display connected, 802.11ac WiFi with iperf sending

With HDMI display connected (same as (2)), and WiFi connected to 802.11ac 5GHz AP in another room, run the following command:


iperf3 -c 10.24.0.1 -t 60

The WiFi throughput is around 110Mbps

 

4) Display connected, running cpuburn

With HDMI display connected (same as (2)), run cpuburn on all 6 cores

 

5) Idle consumption of 4.19-rc1 mainline kernel

Same as (1), but running mainline kernel.

 

Test results

image.png.ee782a35ffa1c9015ba52437413bf81f.png

 

The idle consumption is 1.79W, and it might need some tuning to reduce the consumption. When WiFi and display are connected, it goes higher to 2.87W.

With an active WiFi networking, the board consumes 4.67W, and with all CPU cores active, it consumes 9.86W.

Mainline kernel has a higher idle consumption, the reason might be DDR dvfs and/or devfreq are not implemented yet.

Based on these results, it seems that 5V/2A power is okay if no peripheral devices are connected. However if you connect any USB devices, it may easily exceed the 2A limit when CPU load goes higher.

 

CPU/RAM and IO Performance

While RK3399 is not a super fast chip, its performance fits into its position. To reveal the full potential of the board, I'm posting some visualized sbc-bench results taken from mainline 4.19-rc1 kernel here. This is because there might be some DRAM performance issues on RK3399 with 4.4 kernel.. For comparison, I'm also posting the results of Firefly-RK3399 (2.2/1.8GHz overclock, tested by myself), Raspberry Pi 3 B+, ROCK64 and RockPro64 (taken from existing sbc-bench results)

 

You can see the full sbc-bench log here.

 

Memory

image.png.a6634cc186547423ce4c99e6eb6f52d8.png

 

7-zip

image.png.584ce2f025d7c5d8b95b58b9ea31db30.png

 

cpuminer

image.png.b3004bf8b512e2fbccf5f57125022697.png

 

For IO performance, I use iozone to measure the performance of SD card, eMMC and USB SSD. NanoPC T4's NVMe SSD results are added as a reference.

SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size.

image.png.a592212a7c3909291407784586c0f81e.png

 

Networking

NanoPi M4 comes with a 1Gbps ethernet port and a 802.11ac 2x2 MIMO WiFi module, and I tested both with iperf3.

GbE iperf3 full duplex test:


hjc@nanopim4:~$ iperf3 -c 10.20.0.1 & iperf3 -Rc 10.20.0.1 -p 5202
[1] 27486
Connecting to host 10.20.0.1, port 5201
Connecting to host 10.20.0.1, port 5202
Reverse mode, remote host 10.20.0.1 is sending
[  4] local 10.20.0.2 port 43782 connected to 10.20.0.1 port 5201
[  4] local 10.20.0.2 port 45102 connected to 10.20.0.1 port 5202
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  64.6 MBytes   542 Mbits/sec                  
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  95.1 MBytes   798 Mbits/sec    0    314 KBytes       
[  4]   1.00-2.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   1.00-2.00   sec  94.5 MBytes   793 Mbits/sec    0    320 KBytes       
[  4]   2.00-3.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   2.00-3.00   sec  95.8 MBytes   803 Mbits/sec    0    317 KBytes       
[  4]   3.00-4.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   3.00-4.00   sec  94.5 MBytes   792 Mbits/sec    0    317 KBytes       
[  4]   4.00-5.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   4.00-5.00   sec  94.6 MBytes   794 Mbits/sec    0    314 KBytes       
[  4]   5.00-6.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   5.00-6.00   sec  95.7 MBytes   803 Mbits/sec    0    314 KBytes       
[  4]   6.00-7.00   sec   110 MBytes   919 Mbits/sec                  
[  4]   6.00-7.00   sec  95.5 MBytes   801 Mbits/sec    0    317 KBytes       
[  4]   7.00-8.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   7.00-8.00   sec  94.8 MBytes   795 Mbits/sec    0    314 KBytes       
[  4]   8.00-9.00   sec   110 MBytes   920 Mbits/sec                  
[  4]   8.00-9.00   sec  94.5 MBytes   792 Mbits/sec    0    314 KBytes       
[  4]   9.00-10.00  sec  97.2 MBytes   816 Mbits/sec    0    320 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   952 MBytes   799 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   949 MBytes   796 Mbits/sec                  receiver
[  4]   9.00-10.00  sec   110 MBytes   921 Mbits/sec                  

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
iperf Done.
[  4]   0.00-10.00  sec  1.03 GBytes   884 Mbits/sec    9             sender
[  4]   0.00-10.00  sec  1.03 GBytes   882 Mbits/sec                  receiver

iperf Done.
[1]  + 27486 done       iperf3 -c 10.20.0.1

 

Wireless


hjc@nanopim4:~$ iperf3 -c 10.24.0.1
Connecting to host 10.24.0.1, port 5201
[  4] local 10.23.4.116 port 39730 connected to 10.24.0.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  13.0 MBytes   109 Mbits/sec   13   1.21 MBytes       
[  4]   1.00-2.01   sec  12.9 MBytes   107 Mbits/sec    5    618 KBytes       
[  4]   2.01-3.00   sec  12.6 MBytes   106 Mbits/sec    0    618 KBytes       
[  4]   3.00-4.00   sec  9.35 MBytes  78.7 Mbits/sec    4    329 KBytes       
[  4]   4.00-5.00   sec  11.1 MBytes  92.9 Mbits/sec    0    348 KBytes       
[  4]   5.00-6.00   sec  10.2 MBytes  85.5 Mbits/sec    0    363 KBytes       
[  4]   6.00-7.00   sec  9.37 MBytes  78.6 Mbits/sec    0    387 KBytes       
[  4]   7.00-8.00   sec  10.9 MBytes  91.5 Mbits/sec    0    409 KBytes       
[  4]   8.00-9.00   sec  13.6 MBytes   114 Mbits/sec    0    409 KBytes       
[  4]   9.00-10.00  sec  13.8 MBytes   116 Mbits/sec    0    410 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   117 MBytes  98.0 Mbits/sec   22             sender
[  4]   0.00-10.00  sec   116 MBytes  97.0 Mbits/sec                  receiver

iperf Done.
hjc@nanopim4:~$ iperf3 -c 10.24.0.1 -R
Connecting to host 10.24.0.1, port 5201
Reverse mode, remote host 10.24.0.1 is sending
[  4] local 10.23.4.116 port 39734 connected to 10.24.0.1 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  10.6 MBytes  88.8 Mbits/sec                  
[  4]   1.00-2.00   sec  10.9 MBytes  91.5 Mbits/sec                  
[  4]   2.00-3.00   sec  4.41 MBytes  37.0 Mbits/sec                  
[  4]   3.00-4.00   sec  2.07 MBytes  17.3 Mbits/sec                  
[  4]   4.00-5.00   sec  1018 KBytes  8.34 Mbits/sec                  
[  4]   5.00-6.00   sec  1.29 MBytes  10.8 Mbits/sec                  
[  4]   6.00-7.00   sec  6.48 MBytes  54.4 Mbits/sec                  
[  4]   7.00-8.00   sec  10.8 MBytes  91.0 Mbits/sec                  
[  4]   8.00-9.00   sec  10.7 MBytes  89.9 Mbits/sec                  
[  4]   9.00-10.00  sec  10.7 MBytes  89.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  70.1 MBytes  58.8 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  69.1 MBytes  58.0 Mbits/sec                  receiver

iperf Done.

It's too complicated to analyze the performance of a WiFi connection, but so far I've never seen more than 200Mbps throughput on AP6356S.

Thanks for your testing, this would be very helpful for many end-use.r Want to know the USB 3.0 speed. 

Share this post


Link to post
Share on other sites

Just a quick note about DRAM latency effects. We noticed that with default kernel settings the memory controller when dmc code is active increases memory access latency a lot (details). When testing efficiency of zram swap compression more or less by accident I tested another use case that is highly affected by higher memory latency.

 

When trying to build ARM's Compute Library on a NanoPC T4 limited to 1 GB DRAM (adding extraargs="mem=1110M" to /boot/armbianEnv.txt) my first run was with default settings (/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to dmc_ondemand). Execution time with the build job relying heavily on zram based swap: 107m9.612s.

 

Next try with /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to performance: 80m55.042s.

 

Massive difference only due to some code trying to save some energy and therefore increasing memory latency (details). In Armbian we'll use an ugly hack to 'fix' this but this is something board makers who provide own OS images should also care about (@mindee for example)

 

 

 

 

Share this post


Link to post
Share on other sites
12 hours ago, mindee said:

Want to know the USB 3.0 speed. 

In the "IO performance" chart, NanoPi M4 + JMS578 + 850 EVO is connected via USB 3.0. So the performance is about 388/300 MB/s and 70k/99k IOPS (read/write). This result is lower than that tested on my Intel NUC (435/440 MB/s read/write tested using diskspd64.exe) but it's acceptable, since most people won't connect a SSD in this way, and typically HDD is the bottleneck.

 

Besides, by the time of IO testing, I hadn't thought about the DDR DVFS performance impact, so the IO performance is tested with CPU governor set to "performance", but dmc governor not changed, so there might be a slightly negative performance impact. Also, USB 3.0 on M4 still doesn't work on mainline (maybe due to kernel configuration or dt issues) so I did the IO testing with 4.4 kernel, and the IO performance may be further improved by using mainline.

Share this post


Link to post
Share on other sites
6 hours ago, hjc said:

This result is lower than that tested on my Intel NUC

 

It's also lower compared to the numbers I made with my first RK3399 device some time ago: ODROID-N1 (Hardkernel built RK's 4.4 just like ayufan without CONFIG_ARM_ROCKCHIP_DMC_DEVFREQ). But on NanoPi M4 there's always the internal VIA VL817 hub in between SuperSpeed devices and USB3 host controller and this usually affects performance as well.

 

Update: EVO840 behind JMS567 attached to NanoPC-T4 with same 4.4 kernel and dmc governor set to performance (but CPU crippled down to a quad-core A53 clocked at 800 MHz):

                                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
         1024000   16384   396367   402479   372088   373177   373097   402228

373/400 MB/s read/write. So M4 numbers above need a second test anyway.

Share this post


Link to post
Share on other sites
On 9/2/2018 at 9:34 AM, hjc said:

NanoPi M4 is my first board powered by USB-C

 

Yesterday 2 sets of NanoPi M4 arrived (one 2GB and 4GB) which will make it convenient to test for stuff like heat dissipation in less time (I use one time the supplied thermal pad with the large heatsink, on the 2nd board I replaced the thermal pad with a 20x20mm copper shim of 1mm height I took away from my RockPro64).

 

But now talking just about USB-C: NanoPi M4 while using an USB-C connector to be powered is obviously not USB PD compliant but simply uses the USB-C connector to be combined with a 'dumb' PSU that provides the necessary amount of current without any negotiation. FriendlyELEC shipped a new 5V/4A PSU with USB-A receptacle combined with a very short USB-C cable. Since the PSU is not EU style and I miss the necessary adapter I currently use my standard 'dumb'  5V/2A USB PSU which works fine for the tests I currently do.

 

Not being compliant to USB PD specs (or USB-C powering specifications) means two things: Pro: no expensive USB-C charger needed (check the funny discussion in Khadas forum and there the link to tested USB-C chargers and their prices). A dumb 5V PSU that can deliver the needed amount of current at a stable voltage will be sufficient

 

Cons:

 

Edit: tested with my MacBook Pro USB-C charger not exceeding 500mA with 'dumb' consumers on the other end of the USB-C cable: endless boot/crash cycle. So be prepared to run into troubles combining NanoPi M4 with a real USB-C charger compliant to specs. This didn't work with the Vim2 and it also does not work with NanoPi M4. @hjc: seems you can consider yourself lucky that your USB-C charger is outputting more than 500/900 mA when feeding 'dumb' consumers.

Share this post


Link to post
Share on other sites
On 9/3/2018 at 4:38 PM, mindee said:

SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size.

 

I'm now also testing IO performance on NanoPi M4 to get an idea how much influence the integrated VIA VL817 USB3 hub has. The following test command is used since I test with an el cheapo Samsung EVO 840 (known to exceed 500 MB/s on a SATA port but low capacity and implementing TurboWrite)

taskset -c 4 iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
sleep 300
taskset -c 4 iozone -e -I -a -s 2000M -r 16384k -i 0 -i 1 -i 2

Why do I test first with 100 MB, then wait 5 minutes and test again with 2 GB?

  • Tests with 100 MB to keep results comparable with my previous collection
  • An additional test with 2 GB since we found that with fast USB3 implementations and fast SSDs the maximum throughput isn't reported when testing with low test sizes. 1GB or better 2GB seem like a reasonable additional test to look for real maximum bandwidth possible
  • Why waiting for 5 minutes (sleep 300) in between? Since my 128GB EVO840 implements TurboWrite and therefore is only capable to write data at maximum speed until the TurboWrite buffer is full. So if I would've tested all block sizes with 1GB test data size the EVO itself would've turned into a bottleneck after approx. 2.5GB data written
  • Why 'taskset -c 4' (executing the test on a big core)? Since we're looking for maximum performance here

Raw results as follows

                                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
          102400       4    25559    28658    21964    22012    21957    28183
          102400      16    84846    94435    80322    80926    81083    92975
          102400     512   325556   326434   289616   292997   294076   322974
          102400    1024   347800   362005   324009   327371   329460   352963
          102400   16384   381091   386458   370522   373441   373337   378399
         2048000   16384   401465   402916   375361   375457   375509   398665

In other words: Even with an USB3 hub in between storage performance is excellent as long as only one disk is connected (with more than one disk and accesses in parallel performance will of course drop since then bus contention issues might occur and bandwidth has to be shared). But with HDDs this is no issue at all since they're too slow anyway.

 

Storage performance summary: I've had some fear that the internal USB3 hub will trash storage performance but that's not true (at least when only one disk is connected). As a reference numbers generated with same SSD, same settings and same test methodology on discontinued ODROID-N1 (as an example for RK3399 USB3 without hub and also with same SSD connected to a cheap PCIe SATA controller):

                      Random IO in IOPS     Sequential IO in MB/sec
                        4K read/write           1M read/write
ODROID-N1                 5994/6308               330 / 340
NanoPi M4                 5489/7045               330 / 355
ASM1061 powersave         6010/7900               320 / 325 
ASM1061 performance       9820/16230              330 / 330

The ASM1061 numbers are especially dedicated to all those SBC users believing into 'native SATA'. Sequential storage performance with a good UAS capable USB3-to-SATA bridge is higher compared to cheap PCIe SATA controllers. 'Native' SATA does not exist with RK3399 anyway and for really high SATA storage performance more expensive SATA controllers making use of more than 1 PCIe lane are necessary. But with just an ASM1061 the only two real benefits of PCIe attached SATA over USB3 attached SATA are

  • higher random IO performance
  • less IRQs to process which results in less CPU utilization (see ODROID-N1 link above)

But to attach one or two USB3 HDDs and share them at maximum speed (~100 MB/s through network) the NanoPi M4 is perfectly fine.

 

Edit: Debug output here. We can see the usual 'ERROR Transfer event for disabled endpoint or incorrect stream ring' XHCI error when connecting an USB3 disk enclosure the first time (known problem but doesn't cause any harm) and amount of DRAM is reported differently since I swapped the SD card between my two NanoPi M4

Share this post


Link to post
Share on other sites

And now a quick look at NanoPi M4's massive heatsink. Since I have 2 boards with one board I used the supplied blue thermal pad between RK3399 and heatsink and on the other board instead of thermal pad a 20x20mm copper shim of 1 mm height with a thin film of thermal compound on both sides.

 

Test methodology is sbc-bench.sh -T 70 ; sbc-bench.sh -t 50 (reasons why). Full results: Thermal pad and copper shim

 

Yes, it's a bit hard to interpret this stuff but detailed interpretation is necessary having the design of the heatsink in mind:

  • the thing is massive (huge own thermal mass)
  • the heatsink has fins where air can be directed through (one large and silent fan can easily cool down a whole bunch of M4 with controlled airflow)
  • the heat conductivity of the material between heatsink and SoC plays also an important role

I'm in the process of doing a lot more tests (to generate insights, not fancy graphs and numbers) but for now only the following data revealed (that needs confirmation, I'm not entirely sure if testing with two boards is not a flaw)

 

Let's look at temperatures at the end of both sbc-test thermal tests (the first pre-heats to 70°C, the second one waits until SoC temperature is as low as 50°C):

               start 70°C     start 50°C
thermal pad      85.0°C         85.0°C
copper shim      79.4°C         74.4°C

Copper shim with thermal compound 'wins' but with such a massive heatsink better heat conductivity also has its downsides. So let's look at how fast switching between temperatures works:

              idle to 70°C   high to 50°C
thermal pad      75 sec        330 sec
copper shim     600 sec        750 sec

Results in the first row look reasonable to me (the better the SoC's heat can be dissipated into the massive heatsink the longer it takes until SoC/board temperature rises). The 2nd row (cooling down) IMO needs more investigation since I would've thought that the cooling time would be somewhat similar. But for this I most probably need precise thermal measurements to get a clue about the heatsink temperature itself. At least now when testing with the copper shim efficiently dissipating the SoCs heat into the aluminium block the heatsink gets too hot to handle when SoC temperature is reported as being higher than ~65°C for some time.

 

All of the above was talking about true passive mode. But of course the heatsink is sufficient to be used with active cooling too. Just blow a little bit of air laterally through the heatsink fins and the board can do whatever power hungry task you want and stays cool forever.

 

Quick experiment now at a customer: I put the NanoPi M4 directly behind the air outlet of an annoying noisy Xeon server's PSU so that the 'waste heat' coming out of the server goes through the heatsink fins. Running cpuminer on the M4 in the background scoring 8.8kH/s constantly and temperature below 43°C (at an ambient temp of 18°C in the server room, but the 'waste heat' coming out of the PSU is definitely a bit 'warmer'): https://pastebin.com/raw/s1zvyfU6

 

TL;DR: the massive heatsink is great even in total passive mode when only short load bursts happen (then heat gets efficiently dissipated into the aluminium block, even better with a copper shim instead of thermal pad). For high loads over longer periods the heatsink is still fine without additional ventilation as long as you're comfortable with electronics devices being operated at higher temperatures. To improve heat dissipation simply blowing some air through the heatsink fins is sufficient.

 

I think my further testings will focus on the difference between FriendlyELEC's approach (huge thermal mass and heatsink fins more or less only efficient when combined with some airflow) and RockPro64's implementing a different strategy and now even more efficient since there RK3399 and heatsink can also be connected directly with just some thermal compound in between:

 

IMG_8022.JPG

Share this post


Link to post
Share on other sites
2 hours ago, tkaiser said:

tested with my MacBook Pro USB-C charger not exceeding 500mA with 'dumb' consumers on the other end of the USB-C cable: endless boot/crash cycle.

This is surprising, so using a MacBook charger on M4 is not even better than those old USB-A phone chargers with 5V 2A output without any quick charging protocol?

Anyway this supports your view that the official 5V/4A charger should be purchased as well.

Share this post


Link to post
Share on other sites
34 minutes ago, hjc said:

using a MacBook charger on M4 is not even better than those old USB-A phone chargers with 5V 2A output without any quick charging protocol?

 

The problem is that 'true' USB-C chargers limit current to 500mA or 900mA if the other end of the USB-C cable is not able to negotiate the demand for higher currents. And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4) don't do this. They act in 'dumb' mode and require the PSU to provide 'current without asking' which standard compliants USB-C chargers will refuse. So you need a dumb or tolerant charger too.

 

I've measured booting current with my dumb PSU and this was reported as above 5.6W which would translate to +1100mA which is well above what a standards compliant USB-C charger is allowed to provide to a dumb device.

 

34 minutes ago, hjc said:

Anyway this supports your view that the official 5V/4A charger should be purchased as well.

 

Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then):

 

IMG_8025.JPG

Share this post


Link to post
Share on other sites
1 hour ago, tkaiser said:

Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then):

 

IMG_8025.JPG

I spotted two versions of the charger being sold by local resellers: 5V/2A and 5V/4A. They claim that 5V/2A is sufficient unless you need to connect any peripheral devices (like USB).

Share this post


Link to post
Share on other sites
19 hours ago, tkaiser said:

And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4)

Also VIM "1".  I agree with you that a power delivery compliant USB-C is the only proper way to approach this, if one truly wishes to use USB-C.  Can both of you keep notes on whether you get SD card corruption or failure to boot (with no obvious reason)?  @botfap confirmed my hypothesis (or at the very least has strong evidence leading to the same conclusion) that the 4mA default SD drive levels is not enough.  ASUS (or Rockchip, or whoever did the dev on the "miniarm" hardware) seem to have had the same idea, Tinker uses 8mA as well.  @botfap has stated 12 mA was really the only "safe" setting, I'm waiting to get more feedback. 

 

TL;DR:  Rockchip SoC's appear to have a default setting on SD GPIO's that are generic to all their GPIO's and are no good for 50 MHz + operation across varying SD cards (even decent quality ones), let me know if anything weird is going on.

 

[edit] come to think of it, that would mess with SPI as well at higher speeds (I guess that should be obvious)

Share this post


Link to post
Share on other sites

Im not sure that 12mA was the only safe setting, it was the only setting that helped to stop sd corruption when using very low end or fake sd cards. 8mA was rock solid stable when using any normal sd card. Its also probably a better default for Armbian. If the users sd card wont work properly with 8mA of current then it has no place for sbc use I would have thought, the sd card is almost certain to cause other problems later on

Share this post


Link to post
Share on other sites
11 hours ago, botfap said:

If the users sd card wont work properly with 8mA of current then it has no place for sbc use I would have thought

Agreed.  I'm just wondering how good the routing on these various boards really is.  In any case I've set the rk3328 to 8mA, if anything presents itself on the rk3399 I'd recommend the same.

Share this post


Link to post
Share on other sites
9 hours ago, TonyMac32 said:

Can both of you keep notes on whether you get SD card corruption or failure to boot (with no obvious reason)?

 

Not happened here with Rockchip so far (with EspressoBin I had issues but there I boot now from USB3 and ignore the SD card slot).

 

Another interesting observation: My 2GB NanoPi M4 has higher memory bandwidth and lower memory latency compared to the 4GB version:

BTW: I now also replaced the thermal pad on my 4GB NanoPi M4 with a copper shim + thermal compound and the results are impressive again: At the end of a sbc-bench run max. temperature is now at just 73.3°C (no throttling) compared to reaching 85°C throttling included with thermal pad.

 

Those thermal pads suck.

 

@hjc did you already look into lowering DVFS voltages? In the past we did this using a special Linpack version that is able to spot undervolted DVFS OPP before the board actually crashes.

Share this post


Link to post
Share on other sites

And two quick storage performance updates:

 

1) This is the 8GB eMMC FriendlyELEC sells (write performance limited to 43 MB/s as it's mostly the case with low capacity eMMC, read performance exceeding 130 MB/s, nice random IO values -- compare with our overview):

                                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
          102400       4     8565     8793    24853    24808    19463     8523
          102400      16    25604    25986    66627    66701    56571    24459
          102400     512    42217    42326   125788   126508   125959    40394
          102400    1024    42762    43178   129692   129726   130452    42636
          102400   16384    43914    42877   132828   133254   133417    43012

 

2) In the meantime I built also OMV images for NanoPC T4 and NanoPi M4 and did a quick LanTest benchmark with M4 and an externally connected Seagate Barracuda in an USB3 enclosure (not UAS capable -- but this doesn't matter with HDDs)

 

Close to 100 MB/s sequential speed without any more optimizations is simply awesome:

Bildschirmfoto%202018-09-14%20um%2022.47

(debug output)

Share this post


Link to post
Share on other sites
16 hours ago, tkaiser said:

Not yet, recently I'm trying to use M4 (with mainline kernel) as a network router & gateway (connect 2-3 RTL8153, set up VLAN, routing, NAT, and site to site VPN), and still trying to resolve some USB related issues. Currently with mainline kernel the USB hub must be manually reset (USBDEVFS_RESET) after reboot, or it wouldn't be usable.

Share this post


Link to post
Share on other sites
On 9/13/2018 at 11:05 AM, tkaiser said:

 

The problem is that 'true' USB-C chargers limit current to 500mA or 900mA if the other end of the USB-C cable is not able to negotiate the demand for higher currents. And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4) don't do this. They act in 'dumb' mode and require the PSU to provide 'current without asking' which standard compliants USB-C chargers will refuse. So you need a dumb or tolerant charger too.

 

I've measured booting current with my dumb PSU and this was reported as above 5.6W which would translate to +1100mA which is well above what a standards compliant USB-C charger is allowed to provide to a dumb device.

 

 

Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then):

 

IMG_8025.JPG

The official FriendlyELEC adapter is NOT needed if you have another type.

 

It is a BULL GN901-G plug converter, impressively built and worth the money, but you can use any other plug adapter as well. I am currently running the NanoPi M4 with an el cheapo adapter for 0.18€/piece, and it works well.*

 

*) PS: Please take note that these are actually illegal in the EU (and in Japan and most probably US), since they expose the hot plugs if you pull them out halfway. Do not use them with children around.

Share this post


Link to post
Share on other sites
On 9/13/2018 at 10:20 AM, tkaiser said:

 

Test methodology is sbc-bench.sh -T 70 ; sbc-bench.sh -t 50 (reasons why). Full results: Thermal pad and copper shim

 

I am quite surprised about your results, my own ones with the standard 

sudo /bin/bash bin/sbc-bench.sh -c

show significant differences to my NanoPi M4 (4GB). Yes, I have noticed the differences in compiler versions etc., but that you get 8.7 kH/s in CPUMiner and I get  10.6 kH/s is quite a difference - more than 22%. My system also has  a copper shim (15x15x1.0mm) and reaches only 66.1 °C after cpuminer. The version is also 1.3.3, same as you used.

Share this post


Link to post
Share on other sites
9 minutes ago, emk2203 said:

Yes, I have noticed the differences in compiler versions etc., but that you get 8.7 kH/s in CPUMiner and I get  10.6 kH/s is quite a difference - more than 22%

 

As expected, see the first 3 NanoPC-T4 numbers in my list: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md (it's just GCC 6.3 vs. 7.3 vs. 8.2 that's responsible for this benchmark number variation. Should be a reminder why updating software is a good idea in general -- and why using Debian/Ubuntu or any other 'everything outdated as hell' distro is sometimes so annoying)

 

12 minutes ago, emk2203 said:

My system also has  a copper shim (15x15x1.0mm) and reaches only 66.1 °C after cpuminer. The version is also 1.3.3, same as you used

 

The sbc-bench -t and -T modes provide different results (and possible insights) since the purpose of this test is to use -T as first run to heat up the board to +70°C for sure, then with the -t 50 you allow the board to cool down to 50°C which means in this mode the thermal mass of the huge heatsink plays an important role. A normal sbc-bench run starts from low load to high load so the heatsink's temperature is constantly rising but always below the SoC temperature and the final temperatures will be lower compared to the situation when the heatsink is in fact hotter than the board when the 'sbc-bench -t 50' run starts.

 

Would be really interesting if you run with your setup 'sbc-bench.sh -T 70 ; sbc-bench.sh -t 50' as well.

Share this post


Link to post
Share on other sites
23 hours ago, tkaiser said:

The sbc-bench -t and -T modes provide different results (and possible insights) since the purpose of this test is to use -T as first run to heat up the board to +70°C for sure, then with the -t 50 you allow the board to cool down to 50°C which means in this mode the thermal mass of the huge heatsink plays an important role. A normal sbc-bench run starts from low load to high load so the heatsink's temperature is constantly rising but always below the SoC temperature and the final temperatures will be lower compared to the situation when the heatsink is in fact hotter than the board when the 'sbc-bench -t 50' run starts.

 

Would be really interesting if you run with your setup 'sbc-bench.sh -T 70 ; sbc-bench.sh -t 50' as well.

Shows that you shouldn't trust unfounded assumptions. I thought gcc variation would account for 3 - 4% variation at most, but not the 22% I see.

 

I tried to run the setup you suggested; my SoC won't heat up beyond 69 °C. The temperature fluctuates between 67 and 69 °C, but never reaches 70 °C. Took it off the table and put it back in the box it came in, but this accounts only for 1 K difference. Your benchmark shows for a moment "Heating SoC from 70 °C to 70 °C", but nothing happens, and then the temperature drops again. So, would like to run the test, but cannot get it to work.

 

I had switched over to the development kernel 4.19rc1, since your table shows 4.17 as kernel version for tests, it should be even better comparable than the 4.4 version from the first test.

 

 @Igor @hjc: With the Armbian Bionic image for download, I couldn't get wireless to work. armbian-config accepted my input for wireless up to the password, but after exiting and restarting, the NanoPi M4 didn't show up on the WLAN. I went to the nightly with the dev version of 4.19rc1. Files were needed from the full Armbian package, but even after switching to full firmware, I still get the error about `brcmfmac4356-sdio.txt` missing shown below. `brcmfmac4356-sdio.bin` is in the full firmware image, but was not in the "lite" firmware package which came with the downloaded Bionic image.

 

When I google the error, it seems that I need to copy NVRAM contents from the device to the txt file mentioned in the error message.

 

Any pointer on how to do this is welcome.

 

Worrisome things from `dmesg`, for zswap, I attached statistics also below:

 

zswap: default zpool zbud not available
zswap: pool creation failed
...
rockchip_mmc_get_phase: invalid clk rate
...
dwc3 fe800000.dwc3: Failed to get clk 'ref': -2
dwc3 fe800000.dwc3: failed to initialize core
...
brcmfmac mmc0:0001:1: Direct firmware load for brcm/brcmfmac4356-sdio.txt failed with error -2

sudo grep -R . /sys/kernel/debug/zswap
[sudo] password for emk2203: 
/sys/kernel/debug/zswap/same_filled_pages:0
/sys/kernel/debug/zswap/stored_pages:0
/sys/kernel/debug/zswap/pool_total_size:0
/sys/kernel/debug/zswap/duplicate_entry:0
/sys/kernel/debug/zswap/written_back_pages:0
/sys/kernel/debug/zswap/reject_compress_poor:0
/sys/kernel/debug/zswap/reject_kmemcache_fail:0
/sys/kernel/debug/zswap/reject_alloc_fail:0
/sys/kernel/debug/zswap/reject_reclaim_fail:0
/sys/kernel/debug/zswap/pool_limit_hit:0

 

Share this post


Link to post
Share on other sites
3 hours ago, emk2203 said:

I tried to run the setup you suggested; my SoC won't heat up beyond 69 °C. The temperature fluctuates between 67 and 69 °C, but never reaches 70 °C. Took it off the table and put it back in the box it came in, but this accounts only for 1 K difference

 

Interesting. The thermal compound I used is +30 years old so maybe I should buy a new one (or search for the stuff laying around here anyway somewhere ;) ) and try again.

Share this post


Link to post
Share on other sites

Used M4 with RTL8153 in routing for several days. I'm having some trouble with 4.19-rc1, when the board is doing NAT between RTL8153 and the internal Ethernet port at about ~600Mbps, the system crashes with an invalid memory page request.

nanopim4-oops.thumb.png.a3d3e1c47d68492f4b3fe576daa5686f.png

 

After digging I found that this is a kernel software issue, and ayufan's 4.18-rc8 kernel does not have the problem at all.

 

Crash logs:

Spoiler

 kerne[  274.232128] Modules linked in: xt_nat xt_tcpudp xt_mark xt_TPROXY nf_tp                                                                                                 roxy_ipv6 nf_tproxy_ipv4 xt_REDIRECT veth fuse 8021q garp mrp bridge stp llc cdc                                                                                                 _ether usbnet r8152 rockchip_rga videobuf2_dma_sg videobuf2_memops snd_soc_rockc                                                                                                 hip_spdif snd_soc_rockchip_i2s snd_soc_rockchip_pcm v4l2_mem2mem videobuf2_v4l2                                                                                                  videobuf2_common videodev media nvmem_rockchip_efuse brcmfmac rockchip_io_domain                                                                                                  ip6table_filter rockchipdrm cfg80211 ip6_tables rfkill rockchip_thermal brcmuti                                                                                                 l dw_hdmi analogix_dp iptable_filter ipt_MASQUERADE drm_kms_helper rockchip_sara                                                                                                 dc drm iptable_nat pcie_rockchip_host nf_nat_ipv4 drm_panel_orientation_quirks n                                                                                                 f_nat adc_keys input_polldev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_                                                                                                 mangle tcp_bbr sch_fq ip_tables x_tables ipv6 btrfs libcrc32c xor zstd_decompres                                                                                                 s zstd_compress
l:[  274.231695] Internal error: Oops: 96000004 [#1] SMP
[  274.238767]  xxhash zlib_deflate raid6_pq phy_rockchip_typec phy_rockchip_pci                                                                                                 e rtc_rk808 realtek dwmac_rk stmmac_platform stmmac
[  274.240188] CPU: 2 PID: 21 Comm: ksoftirqd/2 Not tainted 4.19.0-rc1-rk3399 #3                                                                                                 5
[  274.240822] Hardware name: FriendlyElec NanoPi M4 (DT)
[  274.241276] pstate: 40000005 (nZcv daif -PAN -UAO)
[  274.241708] pc : dev_hard_start_xmit+0x48/0x128
[  274.242110] lr : dev_hard_start_xmit+0x90/0x128
[  274.242509] sp : ffff0000092236f0
[  274.242803] x29: ffff0000092236f0 x28: 0000000000000001
[  274.243272] x27: dead000000000100 x26: ffff0000092237bc
[  274.243742] x25: ffff000008c76000 x24: ffff80007abc5090
[  274.244212] x23: ffff000008c7d8d0 x22: ffff000008c7d0c0
[  274.244681] x21: ffff800075186c00 x20: ffff80007abc5000
[  274.245150] x19: dead000000000100 x18: ffff80004600a140
[  274.245620] x17: 0000000000000000 x16: 0000000000000000
[  274.246089] x15: 0000000000000000 x14: 000000000a18006c
[  274.246558] x13: 000000000000001c x12: 0000000000000018
[  274.247028] x11: ffff000009223398 x10: 0000000000000004
[  274.247497] x9 : ffff00000094b8f8 x8 : 0000000000000000
[  274.247966] x7 : 00000000a0000000 x6 : 000000018e36996c
[  274.248436] x5 : 00ffffffffffffff x4 : 0029df11ad4bd57e
[  274.248905] x3 : 0000000000000018 x2 : c574e174c02a4100
[  274.249374] x1 : 0000000000000000 x0 : 0000000000000000
[  274.249845] Process ksoftirqd/2 (pid: 21, stack limit = 0x00000000b03e934a)
[  274.250455] Call trace:
[  274.250676]  dev_hard_start_xmit+0x48/0x128
[  274.251046]  __dev_queue_xmit+0x7b0/0x838
[  274.251402]  dev_queue_xmit+0x10/0x18
[  274.251730]  ip_finish_output2+0x248/0x388
[  274.252094]  ip_finish_output+0x114/0x1f8
[  274.252449]  ip_output+0xa8/0x118
[  274.252744]  ip_forward_finish+0x64/0xa0
[  274.253092]  ip_forward+0x38c/0x4c0
[  274.253401]  ip_rcv_finish+0x54/0x88
[  274.253718]  ip_rcv+0x54/0xc0
[  274.253983]  __netif_receive_skb_one_core+0x54/0x80
[  274.254414]  __netif_receive_skb+0x14/0x60
[  274.254777]  netif_receive_skb_internal+0x38/0xe0
[  274.255194]  napi_gro_complete+0x68/0xb8
[  274.255541]  dev_gro_receive+0x660/0x678
[  274.255889]  napi_gro_receive+0x30/0xc8
[  274.256235]  r8152_poll+0x4a8/0x1068 [r8152]
[  274.256614]  net_rx_action+0x10c/0x2d0
[  274.256948]  __do_softirq+0x10c/0x200
[  274.257274]  run_ksoftirqd+0x30/0x40
[  274.257594]  smpboot_thread_fn+0x1a0/0x1d0
[  274.257957]  kthread+0x128/0x130
[  274.258245]  ret_from_fork+0x10/0x1c
[  274.258564] Code: 91024038 f90023b9 f0002399 f9002fbc (f940027b)
[  274.259107] ---[ end trace b81d4210d61a59cb ]---
[  274.259516] Kernel panic - not syncing: Fatal exception in interrupt
[  274.260075] SMP: stopping secondary CPUs
[  274.260546] Kernel Offset: disabled
[  274.260856] CPU features: 0x0,21806008
[  274.261188] Memory Limit: none
[  274.261462] Rebooting in 10 seconds..

 

@Igor IMO it might be better move to the same dev kernel as Rockchip64 (I've published the commit here, in case you're willing to merge the changes)

Share this post


Link to post
Share on other sites
On 9/12/2018 at 11:36 PM, tkaiser said:

 

I'm now also testing IO performance on NanoPi M4 to get an idea how much influence the integrated VIA VL817 USB3 hub has. The following test command is used since I test with an el cheapo Samsung EVO 840 (known to exceed 500 MB/s on a SATA port but low capacity and implementing TurboWrite)


taskset -c 4 iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
sleep 300
taskset -c 4 iozone -e -I -a -s 2000M -r 16384k -i 0 -i 1 -i 2

Why do I test first with 100 MB, then wait 5 minutes and test again with 2 GB?

  • Tests with 100 MB to keep results comparable with my previous collection
  • An additional test with 2 GB since we found that with fast USB3 implementations and fast SSDs the maximum throughput isn't reported when testing with low test sizes. 1GB or better 2GB seem like a reasonable additional test to look for real maximum bandwidth possible
  • Why waiting for 5 minutes (sleep 300) in between? Since my 128GB EVO840 implements TurboWrite and therefore is only capable to write data at maximum speed until the TurboWrite buffer is full. So if I would've tested all block sizes with 1GB test data size the EVO itself would've turned into a bottleneck after approx. 2.5GB data written
  • Why 'taskset -c 4' (executing the test on a big core)? Since we're looking for maximum performance here

Raw results as follows


                                                              random    random
              kB  reclen    write  rewrite    read    reread    read     write
          102400       4    25559    28658    21964    22012    21957    28183
          102400      16    84846    94435    80322    80926    81083    92975
          102400     512   325556   326434   289616   292997   294076   322974
          102400    1024   347800   362005   324009   327371   329460   352963
          102400   16384   381091   386458   370522   373441   373337   378399
         2048000   16384   401465   402916   375361   375457   375509   398665

In other words: Even with an USB3 hub in between storage performance is excellent as long as only one disk is connected (with more than one disk and accesses in parallel performance will of course drop since then bus contention issues might occur and bandwidth has to be shared). But with HDDs this is no issue at all since they're too slow anyway.

 

Storage performance summary: I've had some fear that the internal USB3 hub will trash storage performance but that's not true (at least when only one disk is connected). As a reference numbers generated with same SSD, same settings and same test methodology on discontinued ODROID-N1 (as an example for RK3399 USB3 without hub and also with same SSD connected to a cheap PCIe SATA controller):


                      Random IO in IOPS     Sequential IO in MB/sec
                        4K read/write           1M read/write
ODROID-N1                 5994/6308               330 / 340
NanoPi M4                 5489/7045               330 / 355
ASM1061 powersave         6010/7900               320 / 325 
ASM1061 performance       9820/16230              330 / 330

The ASM1061 numbers are especially dedicated to all those SBC users believing into 'native SATA'. Sequential storage performance with a good UAS capable USB3-to-SATA bridge is higher compared to cheap PCIe SATA controllers. 'Native' SATA does not exist with RK3399 anyway and for really high SATA storage performance more expensive SATA controllers making use of more than 1 PCIe lane are necessary. But with just an ASM1061 the only two real benefits of PCIe attached SATA over USB3 attached SATA are

  • higher random IO performance
  • less IRQs to process which results in less CPU utilization (see ODROID-N1 link above)

But to attach one or two USB3 HDDs and share them at maximum speed (~100 MB/s through network) the NanoPi M4 is perfectly fine.

 

Edit: Debug output here. We can see the usual 'ERROR Transfer event for disabled endpoint or incorrect stream ring' XHCI error when connecting an USB3 disk enclosure the first time (known problem but doesn't cause any harm) and amount of DRAM is reported differently since I swapped the SD card between my two NanoPi M4

 

 

Thanks for your suggestion, we made a SATA HAT prototype for NanoPi M4, it can connect  with 4x 3.5inch hard drive and work well.

 

 IMG_1132.thumb.jpg.69c80a5030248bdb5cdbe0e7c3965689.jpg

Share this post


Link to post
Share on other sites
10 hours ago, mindee said:

Thanks for your suggestion, we made a SATA HAT prototype for NanoPi M4, it can connect  with 4x 3.5inch hard drive and work well

 

Really looking forward to this HAT :)

 

BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB?

Share this post


Link to post
Share on other sites
21 minutes ago, tkaiser said:

 

Really looking forward to this HAT :)

 

BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB?

 

21 minutes ago, tkaiser said:

 

Really looking forward to this HAT :)

 

BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB?

 

I think the manufacturing cost is not so high, I am not sure for that now. The NEO4 use a differencet PCIe connector, it’s not a easy thing to fit both with one HAT, considering the signal quality.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
1 1