hjc Posted September 2, 2018 Posted September 2, 2018 I've been doing some tests with NanoPi M4 these days. While I'm not a professional board reviewer, here I can share some early performance numbers to you. Beware that none of these tests fit into real world use cases, they are just provided as-is. Besides, Armbian development on RK3399 boards are still at a very early stage, so any of these numbers may change in the future, due to software changes. Unless mentioned, all tests are done using Armbian nightly image, FriendlyARM 4.4 kernel, CPU clocked at 2.0/1.5GHz Powering NanoPi M4 is my first board powered by USB-C, while RK3399 is not power-hungry under normal load, I do doubt if 5V/3A power supply is sufficient when the CPU load goes higher, or when a lot of USB devices are connected. So I went a series of power measurement, with this tool That is to measure the power consumption on the USB side, excluding the consumption of PSU. The board is powered by the USB-C charger that came with my Huawei MateBook E, which supports 5V/2A, 9V/2A, and 12V/2A, so theoretically it is insufficient to power the NanoPi M4 board. Unfortunately I can't find a USB-C charger capable of 5V/3A output, and I have to do such test with it. What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed. But normally the board's consumption will not exceed 10W, so the charger works just fine. Test setup 1) Idle consumption This is the typical consumption when you use it as an headless server. 2) Idle consumption with HDMI display output (console tty interface, no Desktop/X11/GPU stuff) Testing with Dell P2415Q 4k 60Hz display. HDMI connected, with 2560*1440 60Hz video output. Also connect the USB 3.0 hub to 3) Display connected, 802.11ac WiFi with iperf sending With HDMI display connected (same as (2)), and WiFi connected to 802.11ac 5GHz AP in another room, run the following command: iperf3 -c 10.24.0.1 -t 60 The WiFi throughput is around 110Mbps 4) Display connected, running cpuburn With HDMI display connected (same as (2)), run cpuburn on all 6 cores 5) Idle consumption of 4.19-rc1 mainline kernel Same as (1), but running mainline kernel. Test results The idle consumption is 1.79W, and it might need some tuning to reduce the consumption. When WiFi and display are connected, it goes higher to 2.87W. With an active WiFi networking, the board consumes 4.67W, and with all CPU cores active, it consumes 9.86W. Mainline kernel has a higher idle consumption, the reason might be DDR dvfs and/or devfreq are not implemented yet. Based on these results, it seems that 5V/2A power is okay if no peripheral devices are connected. However if you connect any USB devices, it may easily exceed the 2A limit when CPU load goes higher. CPU/RAM and IO Performance While RK3399 is not a super fast chip, its performance fits into its position. To reveal the full potential of the board, I'm posting some visualized sbc-bench results taken from mainline 4.19-rc1 kernel here. This is because there might be some DRAM performance issues on RK3399 with 4.4 kernel.. For comparison, I'm also posting the results of Firefly-RK3399 (2.2/1.8GHz overclock, tested by myself), Raspberry Pi 3 B+, ROCK64 and RockPro64 (taken from existing sbc-bench results) You can see the full sbc-bench log here. Memory 7-zip cpuminer For IO performance, I use iozone to measure the performance of SD card, eMMC and USB SSD. NanoPC T4's NVMe SSD results are added as a reference. SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size. Networking NanoPi M4 comes with a 1Gbps ethernet port and a 802.11ac 2x2 MIMO WiFi module, and I tested both with iperf3. GbE iperf3 full duplex test: hjc@nanopim4:~$ iperf3 -c 10.20.0.1 & iperf3 -Rc 10.20.0.1 -p 5202 [1] 27486 Connecting to host 10.20.0.1, port 5201 Connecting to host 10.20.0.1, port 5202 Reverse mode, remote host 10.20.0.1 is sending [ 4] local 10.20.0.2 port 43782 connected to 10.20.0.1 port 5201 [ 4] local 10.20.0.2 port 45102 connected to 10.20.0.1 port 5202 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 64.6 MBytes 542 Mbits/sec [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 95.1 MBytes 798 Mbits/sec 0 314 KBytes [ 4] 1.00-2.00 sec 110 MBytes 919 Mbits/sec [ 4] 1.00-2.00 sec 94.5 MBytes 793 Mbits/sec 0 320 KBytes [ 4] 2.00-3.00 sec 110 MBytes 920 Mbits/sec [ 4] 2.00-3.00 sec 95.8 MBytes 803 Mbits/sec 0 317 KBytes [ 4] 3.00-4.00 sec 110 MBytes 920 Mbits/sec [ 4] 3.00-4.00 sec 94.5 MBytes 792 Mbits/sec 0 317 KBytes [ 4] 4.00-5.00 sec 110 MBytes 920 Mbits/sec [ 4] 4.00-5.00 sec 94.6 MBytes 794 Mbits/sec 0 314 KBytes [ 4] 5.00-6.00 sec 110 MBytes 919 Mbits/sec [ 4] 5.00-6.00 sec 95.7 MBytes 803 Mbits/sec 0 314 KBytes [ 4] 6.00-7.00 sec 110 MBytes 919 Mbits/sec [ 4] 6.00-7.00 sec 95.5 MBytes 801 Mbits/sec 0 317 KBytes [ 4] 7.00-8.00 sec 110 MBytes 920 Mbits/sec [ 4] 7.00-8.00 sec 94.8 MBytes 795 Mbits/sec 0 314 KBytes [ 4] 8.00-9.00 sec 110 MBytes 920 Mbits/sec [ 4] 8.00-9.00 sec 94.5 MBytes 792 Mbits/sec 0 314 KBytes [ 4] 9.00-10.00 sec 97.2 MBytes 816 Mbits/sec 0 320 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 952 MBytes 799 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 949 MBytes 796 Mbits/sec receiver [ 4] 9.00-10.00 sec 110 MBytes 921 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr iperf Done. [ 4] 0.00-10.00 sec 1.03 GBytes 884 Mbits/sec 9 sender [ 4] 0.00-10.00 sec 1.03 GBytes 882 Mbits/sec receiver iperf Done. [1] + 27486 done iperf3 -c 10.20.0.1 Wireless hjc@nanopim4:~$ iperf3 -c 10.24.0.1 Connecting to host 10.24.0.1, port 5201 [ 4] local 10.23.4.116 port 39730 connected to 10.24.0.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 13.0 MBytes 109 Mbits/sec 13 1.21 MBytes [ 4] 1.00-2.01 sec 12.9 MBytes 107 Mbits/sec 5 618 KBytes [ 4] 2.01-3.00 sec 12.6 MBytes 106 Mbits/sec 0 618 KBytes [ 4] 3.00-4.00 sec 9.35 MBytes 78.7 Mbits/sec 4 329 KBytes [ 4] 4.00-5.00 sec 11.1 MBytes 92.9 Mbits/sec 0 348 KBytes [ 4] 5.00-6.00 sec 10.2 MBytes 85.5 Mbits/sec 0 363 KBytes [ 4] 6.00-7.00 sec 9.37 MBytes 78.6 Mbits/sec 0 387 KBytes [ 4] 7.00-8.00 sec 10.9 MBytes 91.5 Mbits/sec 0 409 KBytes [ 4] 8.00-9.00 sec 13.6 MBytes 114 Mbits/sec 0 409 KBytes [ 4] 9.00-10.00 sec 13.8 MBytes 116 Mbits/sec 0 410 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 117 MBytes 98.0 Mbits/sec 22 sender [ 4] 0.00-10.00 sec 116 MBytes 97.0 Mbits/sec receiver iperf Done. hjc@nanopim4:~$ iperf3 -c 10.24.0.1 -R Connecting to host 10.24.0.1, port 5201 Reverse mode, remote host 10.24.0.1 is sending [ 4] local 10.23.4.116 port 39734 connected to 10.24.0.1 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 10.6 MBytes 88.8 Mbits/sec [ 4] 1.00-2.00 sec 10.9 MBytes 91.5 Mbits/sec [ 4] 2.00-3.00 sec 4.41 MBytes 37.0 Mbits/sec [ 4] 3.00-4.00 sec 2.07 MBytes 17.3 Mbits/sec [ 4] 4.00-5.00 sec 1018 KBytes 8.34 Mbits/sec [ 4] 5.00-6.00 sec 1.29 MBytes 10.8 Mbits/sec [ 4] 6.00-7.00 sec 6.48 MBytes 54.4 Mbits/sec [ 4] 7.00-8.00 sec 10.8 MBytes 91.0 Mbits/sec [ 4] 8.00-9.00 sec 10.7 MBytes 89.9 Mbits/sec [ 4] 9.00-10.00 sec 10.7 MBytes 89.8 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 70.1 MBytes 58.8 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 69.1 MBytes 58.0 Mbits/sec receiver iperf Done. It's too complicated to analyze the performance of a WiFi connection, but so far I've never seen more than 200Mbps throughput on AP6356S. 3
tkaiser Posted September 3, 2018 Posted September 3, 2018 22 hours ago, hjc said: What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed Thank you for the detailed test and especially covering also the underpowering situation (a lot of users might run into). I don't know whether it was available from the beginning but now FriendlyELEC lists as options in their web shop: 5V 4A Power Adapter (+$8.99) German Plug Adapter( applies to: France,Germany,Portugal,Spain,Korea ) (+$5.99) The PSU as well as the heatsink seem like mandatory accessories to me.
mindee Posted September 3, 2018 Posted September 3, 2018 On 9/2/2018 at 12:34 AM, hjc said: I've been doing some tests with NanoPi M4 these days. While I'm not a professional board reviewer, here I can share some early performance numbers to you. Beware that none of these tests fit into real world use cases, they are just provided as-is. Besides, Armbian development on RK3399 boards are still at a very early stage, so any of these numbers may change in the future, due to software changes. Unless mentioned, all tests are done using Armbian nightly image, FriendlyARM 4.4 kernel, CPU clocked at 2.0/1.5GHz Powering NanoPi M4 is my first board powered by USB-C, while RK3399 is not power-hungry under normal load, I do doubt if 5V/3A power supply is sufficient when the CPU load goes higher, or when a lot of USB devices are connected. So I went a series of power measurement, with this tool That is to measure the power consumption on the USB side, excluding the consumption of PSU. The board is powered by the USB-C charger that came with my Huawei MateBook E, which supports 5V/2A, 9V/2A, and 12V/2A, so theoretically it is insufficient to power the NanoPi M4 board. Unfortunately I can't find a USB-C charger capable of 5V/3A output, and I have to do such test with it. What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed. But normally the board's consumption will not exceed 10W, so the charger works just fine. Test setup 1) Idle consumption This is the typical consumption when you use it as an headless server. 2) Idle consumption with HDMI display output (console tty interface, no Desktop/X11/GPU stuff) Testing with Dell P2415Q 4k 60Hz display. HDMI connected, with 2560*1440 60Hz video output. Also connect the USB 3.0 hub to 3) Display connected, 802.11ac WiFi with iperf sending With HDMI display connected (same as (2)), and WiFi connected to 802.11ac 5GHz AP in another room, run the following command: iperf3 -c 10.24.0.1 -t 60 The WiFi throughput is around 110Mbps 4) Display connected, running cpuburn With HDMI display connected (same as (2)), run cpuburn on all 6 cores 5) Idle consumption of 4.19-rc1 mainline kernel Same as (1), but running mainline kernel. Test results The idle consumption is 1.79W, and it might need some tuning to reduce the consumption. When WiFi and display are connected, it goes higher to 2.87W. With an active WiFi networking, the board consumes 4.67W, and with all CPU cores active, it consumes 9.86W. Mainline kernel has a higher idle consumption, the reason might be DDR dvfs and/or devfreq are not implemented yet. Based on these results, it seems that 5V/2A power is okay if no peripheral devices are connected. However if you connect any USB devices, it may easily exceed the 2A limit when CPU load goes higher. CPU/RAM and IO Performance While RK3399 is not a super fast chip, its performance fits into its position. To reveal the full potential of the board, I'm posting some visualized sbc-bench results taken from mainline 4.19-rc1 kernel here. This is because there might be some DRAM performance issues on RK3399 with 4.4 kernel.. For comparison, I'm also posting the results of Firefly-RK3399 (2.2/1.8GHz overclock, tested by myself), Raspberry Pi 3 B+, ROCK64 and RockPro64 (taken from existing sbc-bench results) You can see the full sbc-bench log here. Memory 7-zip cpuminer For IO performance, I use iozone to measure the performance of SD card, eMMC and USB SSD. NanoPC T4's NVMe SSD results are added as a reference. SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size. Networking NanoPi M4 comes with a 1Gbps ethernet port and a 802.11ac 2x2 MIMO WiFi module, and I tested both with iperf3. GbE iperf3 full duplex test: hjc@nanopim4:~$ iperf3 -c 10.20.0.1 & iperf3 -Rc 10.20.0.1 -p 5202 [1] 27486 Connecting to host 10.20.0.1, port 5201 Connecting to host 10.20.0.1, port 5202 Reverse mode, remote host 10.20.0.1 is sending [ 4] local 10.20.0.2 port 43782 connected to 10.20.0.1 port 5201 [ 4] local 10.20.0.2 port 45102 connected to 10.20.0.1 port 5202 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 64.6 MBytes 542 Mbits/sec [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 95.1 MBytes 798 Mbits/sec 0 314 KBytes [ 4] 1.00-2.00 sec 110 MBytes 919 Mbits/sec [ 4] 1.00-2.00 sec 94.5 MBytes 793 Mbits/sec 0 320 KBytes [ 4] 2.00-3.00 sec 110 MBytes 920 Mbits/sec [ 4] 2.00-3.00 sec 95.8 MBytes 803 Mbits/sec 0 317 KBytes [ 4] 3.00-4.00 sec 110 MBytes 920 Mbits/sec [ 4] 3.00-4.00 sec 94.5 MBytes 792 Mbits/sec 0 317 KBytes [ 4] 4.00-5.00 sec 110 MBytes 920 Mbits/sec [ 4] 4.00-5.00 sec 94.6 MBytes 794 Mbits/sec 0 314 KBytes [ 4] 5.00-6.00 sec 110 MBytes 919 Mbits/sec [ 4] 5.00-6.00 sec 95.7 MBytes 803 Mbits/sec 0 314 KBytes [ 4] 6.00-7.00 sec 110 MBytes 919 Mbits/sec [ 4] 6.00-7.00 sec 95.5 MBytes 801 Mbits/sec 0 317 KBytes [ 4] 7.00-8.00 sec 110 MBytes 920 Mbits/sec [ 4] 7.00-8.00 sec 94.8 MBytes 795 Mbits/sec 0 314 KBytes [ 4] 8.00-9.00 sec 110 MBytes 920 Mbits/sec [ 4] 8.00-9.00 sec 94.5 MBytes 792 Mbits/sec 0 314 KBytes [ 4] 9.00-10.00 sec 97.2 MBytes 816 Mbits/sec 0 320 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 952 MBytes 799 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 949 MBytes 796 Mbits/sec receiver [ 4] 9.00-10.00 sec 110 MBytes 921 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr iperf Done. [ 4] 0.00-10.00 sec 1.03 GBytes 884 Mbits/sec 9 sender [ 4] 0.00-10.00 sec 1.03 GBytes 882 Mbits/sec receiver iperf Done. [1] + 27486 done iperf3 -c 10.20.0.1 Wireless hjc@nanopim4:~$ iperf3 -c 10.24.0.1 Connecting to host 10.24.0.1, port 5201 [ 4] local 10.23.4.116 port 39730 connected to 10.24.0.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 13.0 MBytes 109 Mbits/sec 13 1.21 MBytes [ 4] 1.00-2.01 sec 12.9 MBytes 107 Mbits/sec 5 618 KBytes [ 4] 2.01-3.00 sec 12.6 MBytes 106 Mbits/sec 0 618 KBytes [ 4] 3.00-4.00 sec 9.35 MBytes 78.7 Mbits/sec 4 329 KBytes [ 4] 4.00-5.00 sec 11.1 MBytes 92.9 Mbits/sec 0 348 KBytes [ 4] 5.00-6.00 sec 10.2 MBytes 85.5 Mbits/sec 0 363 KBytes [ 4] 6.00-7.00 sec 9.37 MBytes 78.6 Mbits/sec 0 387 KBytes [ 4] 7.00-8.00 sec 10.9 MBytes 91.5 Mbits/sec 0 409 KBytes [ 4] 8.00-9.00 sec 13.6 MBytes 114 Mbits/sec 0 409 KBytes [ 4] 9.00-10.00 sec 13.8 MBytes 116 Mbits/sec 0 410 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 117 MBytes 98.0 Mbits/sec 22 sender [ 4] 0.00-10.00 sec 116 MBytes 97.0 Mbits/sec receiver iperf Done. hjc@nanopim4:~$ iperf3 -c 10.24.0.1 -R Connecting to host 10.24.0.1, port 5201 Reverse mode, remote host 10.24.0.1 is sending [ 4] local 10.23.4.116 port 39734 connected to 10.24.0.1 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 10.6 MBytes 88.8 Mbits/sec [ 4] 1.00-2.00 sec 10.9 MBytes 91.5 Mbits/sec [ 4] 2.00-3.00 sec 4.41 MBytes 37.0 Mbits/sec [ 4] 3.00-4.00 sec 2.07 MBytes 17.3 Mbits/sec [ 4] 4.00-5.00 sec 1018 KBytes 8.34 Mbits/sec [ 4] 5.00-6.00 sec 1.29 MBytes 10.8 Mbits/sec [ 4] 6.00-7.00 sec 6.48 MBytes 54.4 Mbits/sec [ 4] 7.00-8.00 sec 10.8 MBytes 91.0 Mbits/sec [ 4] 8.00-9.00 sec 10.7 MBytes 89.9 Mbits/sec [ 4] 9.00-10.00 sec 10.7 MBytes 89.8 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 70.1 MBytes 58.8 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 69.1 MBytes 58.0 Mbits/sec receiver iperf Done. It's too complicated to analyze the performance of a WiFi connection, but so far I've never seen more than 200Mbps throughput on AP6356S. Thanks for your testing, this would be very helpful for many end-use.r Want to know the USB 3.0 speed.
tkaiser Posted September 3, 2018 Posted September 3, 2018 Just a quick note about DRAM latency effects. We noticed that with default kernel settings the memory controller when dmc code is active increases memory access latency a lot (details). When testing efficiency of zram swap compression more or less by accident I tested another use case that is highly affected by higher memory latency. When trying to build ARM's Compute Library on a NanoPC T4 limited to 1 GB DRAM (adding extraargs="mem=1110M" to /boot/armbianEnv.txt) my first run was with default settings (/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to dmc_ondemand). Execution time with the build job relying heavily on zram based swap: 107m9.612s. Next try with /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to performance: 80m55.042s. Massive difference only due to some code trying to save some energy and therefore increasing memory latency (details). In Armbian we'll use an ugly hack to 'fix' this but this is something board makers who provide own OS images should also care about (@mindee for example)
hjc Posted September 4, 2018 Author Posted September 4, 2018 12 hours ago, mindee said: Want to know the USB 3.0 speed. In the "IO performance" chart, NanoPi M4 + JMS578 + 850 EVO is connected via USB 3.0. So the performance is about 388/300 MB/s and 70k/99k IOPS (read/write). This result is lower than that tested on my Intel NUC (435/440 MB/s read/write tested using diskspd64.exe) but it's acceptable, since most people won't connect a SSD in this way, and typically HDD is the bottleneck. Besides, by the time of IO testing, I hadn't thought about the DDR DVFS performance impact, so the IO performance is tested with CPU governor set to "performance", but dmc governor not changed, so there might be a slightly negative performance impact. Also, USB 3.0 on M4 still doesn't work on mainline (maybe due to kernel configuration or dt issues) so I did the IO testing with 4.4 kernel, and the IO performance may be further improved by using mainline.
tkaiser Posted September 4, 2018 Posted September 4, 2018 6 hours ago, hjc said: This result is lower than that tested on my Intel NUC It's also lower compared to the numbers I made with my first RK3399 device some time ago: ODROID-N1 (Hardkernel built RK's 4.4 just like ayufan without CONFIG_ARM_ROCKCHIP_DMC_DEVFREQ). But on NanoPi M4 there's always the internal VIA VL817 hub in between SuperSpeed devices and USB3 host controller and this usually affects performance as well. Update: EVO840 behind JMS567 attached to NanoPC-T4 with same 4.4 kernel and dmc governor set to performance (but CPU crippled down to a quad-core A53 clocked at 800 MHz): random random kB reclen write rewrite read reread read write 1024000 16384 396367 402479 372088 373177 373097 402228 373/400 MB/s read/write. So M4 numbers above need a second test anyway.
tkaiser Posted September 13, 2018 Posted September 13, 2018 On 9/2/2018 at 9:34 AM, hjc said: NanoPi M4 is my first board powered by USB-C Yesterday 2 sets of NanoPi M4 arrived (one 2GB and 4GB) which will make it convenient to test for stuff like heat dissipation in less time (I use one time the supplied thermal pad with the large heatsink, on the 2nd board I replaced the thermal pad with a 20x20mm copper shim of 1mm height I took away from my RockPro64). But now talking just about USB-C: NanoPi M4 while using an USB-C connector to be powered is obviously not USB PD compliant but simply uses the USB-C connector to be combined with a 'dumb' PSU that provides the necessary amount of current without any negotiation. FriendlyELEC shipped a new 5V/4A PSU with USB-A receptacle combined with a very short USB-C cable. Since the PSU is not EU style and I miss the necessary adapter I currently use my standard 'dumb' 5V/2A USB PSU which works fine for the tests I currently do. Not being compliant to USB PD specs (or USB-C powering specifications) means two things: Pro: no expensive USB-C charger needed (check the funny discussion in Khadas forum and there the link to tested USB-C chargers and their prices). A dumb 5V PSU that can deliver the needed amount of current at a stable voltage will be sufficient Cons: possible voltage drop N° 1: at 5V with higher currents the majority of cheap PSUs won't deliver. They can do either 5V or 4A but not both at the same time (this issue can be studied around ODROID boards that are also designed with 5V/4A in mind) possible voltage drop N° 2: the cable quality and length between board and PSU becomes an issue since here an additional voltage drop happens. The longer the cable and the thinner the wires the less the voltage available to the board (symptoms as above: especially USB disks or peripherals in general will be affected) possible undercurrent: I don't know yet but in case the NanoPi M4 really does no USB PD power signalling at all stuff like with Khadas Vim2 might happen Not possible to use 'real' or standards compliant USB-C PSUs since not providing enough current since the powered device is not able to negotiate higher current demand Edit: tested with my MacBook Pro USB-C charger not exceeding 500mA with 'dumb' consumers on the other end of the USB-C cable: endless boot/crash cycle. So be prepared to run into troubles combining NanoPi M4 with a real USB-C charger compliant to specs. This didn't work with the Vim2 and it also does not work with NanoPi M4. @hjc: seems you can consider yourself lucky that your USB-C charger is outputting more than 500/900 mA when feeding 'dumb' consumers. 1
tkaiser Posted September 13, 2018 Posted September 13, 2018 On 9/3/2018 at 4:38 PM, mindee said: SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size. I'm now also testing IO performance on NanoPi M4 to get an idea how much influence the integrated VIA VL817 USB3 hub has. The following test command is used since I test with an el cheapo Samsung EVO 840 (known to exceed 500 MB/s on a SATA port but low capacity and implementing TurboWrite) taskset -c 4 iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 sleep 300 taskset -c 4 iozone -e -I -a -s 2000M -r 16384k -i 0 -i 1 -i 2 Why do I test first with 100 MB, then wait 5 minutes and test again with 2 GB? Tests with 100 MB to keep results comparable with my previous collection An additional test with 2 GB since we found that with fast USB3 implementations and fast SSDs the maximum throughput isn't reported when testing with low test sizes. 1GB or better 2GB seem like a reasonable additional test to look for real maximum bandwidth possible Why waiting for 5 minutes (sleep 300) in between? Since my 128GB EVO840 implements TurboWrite and therefore is only capable to write data at maximum speed until the TurboWrite buffer is full. So if I would've tested all block sizes with 1GB test data size the EVO itself would've turned into a bottleneck after approx. 2.5GB data written Why 'taskset -c 4' (executing the test on a big core)? Since we're looking for maximum performance here Raw results as follows random random kB reclen write rewrite read reread read write 102400 4 25559 28658 21964 22012 21957 28183 102400 16 84846 94435 80322 80926 81083 92975 102400 512 325556 326434 289616 292997 294076 322974 102400 1024 347800 362005 324009 327371 329460 352963 102400 16384 381091 386458 370522 373441 373337 378399 2048000 16384 401465 402916 375361 375457 375509 398665 In other words: Even with an USB3 hub in between storage performance is excellent as long as only one disk is connected (with more than one disk and accesses in parallel performance will of course drop since then bus contention issues might occur and bandwidth has to be shared). But with HDDs this is no issue at all since they're too slow anyway. Storage performance summary: I've had some fear that the internal USB3 hub will trash storage performance but that's not true (at least when only one disk is connected). As a reference numbers generated with same SSD, same settings and same test methodology on discontinued ODROID-N1 (as an example for RK3399 USB3 without hub and also with same SSD connected to a cheap PCIe SATA controller): Random IO in IOPS Sequential IO in MB/sec 4K read/write 1M read/write ODROID-N1 5994/6308 330 / 340 NanoPi M4 5489/7045 330 / 355 ASM1061 powersave 6010/7900 320 / 325 ASM1061 performance 9820/16230 330 / 330 The ASM1061 numbers are especially dedicated to all those SBC users believing into 'native SATA'. Sequential storage performance with a good UAS capable USB3-to-SATA bridge is higher compared to cheap PCIe SATA controllers. 'Native' SATA does not exist with RK3399 anyway and for really high SATA storage performance more expensive SATA controllers making use of more than 1 PCIe lane are necessary. But with just an ASM1061 the only two real benefits of PCIe attached SATA over USB3 attached SATA are higher random IO performance less IRQs to process which results in less CPU utilization (see ODROID-N1 link above) But to attach one or two USB3 HDDs and share them at maximum speed (~100 MB/s through network) the NanoPi M4 is perfectly fine. Edit: Debug output here. We can see the usual 'ERROR Transfer event for disabled endpoint or incorrect stream ring' XHCI error when connecting an USB3 disk enclosure the first time (known problem but doesn't cause any harm) and amount of DRAM is reported differently since I swapped the SD card between my two NanoPi M4 1
tkaiser Posted September 13, 2018 Posted September 13, 2018 And now a quick look at NanoPi M4's massive heatsink. Since I have 2 boards with one board I used the supplied blue thermal pad between RK3399 and heatsink and on the other board instead of thermal pad a 20x20mm copper shim of 1 mm height with a thin film of thermal compound on both sides. Test methodology is sbc-bench.sh -T 70 ; sbc-bench.sh -t 50 (reasons why). Full results: Thermal pad and copper shim Yes, it's a bit hard to interpret this stuff but detailed interpretation is necessary having the design of the heatsink in mind: the thing is massive (huge own thermal mass) the heatsink has fins where air can be directed through (one large and silent fan can easily cool down a whole bunch of M4 with controlled airflow) the heat conductivity of the material between heatsink and SoC plays also an important role I'm in the process of doing a lot more tests (to generate insights, not fancy graphs and numbers) but for now only the following data revealed (that needs confirmation, I'm not entirely sure if testing with two boards is not a flaw) Let's look at temperatures at the end of both sbc-test thermal tests (the first pre-heats to 70°C, the second one waits until SoC temperature is as low as 50°C): start 70°C start 50°C thermal pad 85.0°C 85.0°C copper shim 79.4°C 74.4°C Copper shim with thermal compound 'wins' but with such a massive heatsink better heat conductivity also has its downsides. So let's look at how fast switching between temperatures works: idle to 70°C high to 50°C thermal pad 75 sec 330 sec copper shim 600 sec 750 sec Results in the first row look reasonable to me (the better the SoC's heat can be dissipated into the massive heatsink the longer it takes until SoC/board temperature rises). The 2nd row (cooling down) IMO needs more investigation since I would've thought that the cooling time would be somewhat similar. But for this I most probably need precise thermal measurements to get a clue about the heatsink temperature itself. At least now when testing with the copper shim efficiently dissipating the SoCs heat into the aluminium block the heatsink gets too hot to handle when SoC temperature is reported as being higher than ~65°C for some time. All of the above was talking about true passive mode. But of course the heatsink is sufficient to be used with active cooling too. Just blow a little bit of air laterally through the heatsink fins and the board can do whatever power hungry task you want and stays cool forever. Quick experiment now at a customer: I put the NanoPi M4 directly behind the air outlet of an annoying noisy Xeon server's PSU so that the 'waste heat' coming out of the server goes through the heatsink fins. Running cpuminer on the M4 in the background scoring 8.8kH/s constantly and temperature below 43°C (at an ambient temp of 18°C in the server room, but the 'waste heat' coming out of the PSU is definitely a bit 'warmer'): https://pastebin.com/raw/s1zvyfU6 TL;DR: the massive heatsink is great even in total passive mode when only short load bursts happen (then heat gets efficiently dissipated into the aluminium block, even better with a copper shim instead of thermal pad). For high loads over longer periods the heatsink is still fine without additional ventilation as long as you're comfortable with electronics devices being operated at higher temperatures. To improve heat dissipation simply blowing some air through the heatsink fins is sufficient. I think my further testings will focus on the difference between FriendlyELEC's approach (huge thermal mass and heatsink fins more or less only efficient when combined with some airflow) and RockPro64's implementing a different strategy and now even more efficient since there RK3399 and heatsink can also be connected directly with just some thermal compound in between:
hjc Posted September 13, 2018 Author Posted September 13, 2018 2 hours ago, tkaiser said: tested with my MacBook Pro USB-C charger not exceeding 500mA with 'dumb' consumers on the other end of the USB-C cable: endless boot/crash cycle. This is surprising, so using a MacBook charger on M4 is not even better than those old USB-A phone chargers with 5V 2A output without any quick charging protocol? Anyway this supports your view that the official 5V/4A charger should be purchased as well.
tkaiser Posted September 13, 2018 Posted September 13, 2018 34 minutes ago, hjc said: using a MacBook charger on M4 is not even better than those old USB-A phone chargers with 5V 2A output without any quick charging protocol? The problem is that 'true' USB-C chargers limit current to 500mA or 900mA if the other end of the USB-C cable is not able to negotiate the demand for higher currents. And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4) don't do this. They act in 'dumb' mode and require the PSU to provide 'current without asking' which standard compliants USB-C chargers will refuse. So you need a dumb or tolerant charger too. I've measured booting current with my dumb PSU and this was reported as above 5.6W which would translate to +1100mA which is well above what a standards compliant USB-C charger is allowed to provide to a dumb device. 34 minutes ago, hjc said: Anyway this supports your view that the official 5V/4A charger should be purchased as well. Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then):
hjc Posted September 13, 2018 Author Posted September 13, 2018 1 hour ago, tkaiser said: Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then): I spotted two versions of the charger being sold by local resellers: 5V/2A and 5V/4A. They claim that 5V/2A is sufficient unless you need to connect any peripheral devices (like USB).
TonyMac32 Posted September 14, 2018 Posted September 14, 2018 19 hours ago, tkaiser said: And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4) Also VIM "1". I agree with you that a power delivery compliant USB-C is the only proper way to approach this, if one truly wishes to use USB-C. Can both of you keep notes on whether you get SD card corruption or failure to boot (with no obvious reason)? @botfap confirmed my hypothesis (or at the very least has strong evidence leading to the same conclusion) that the 4mA default SD drive levels is not enough. ASUS (or Rockchip, or whoever did the dev on the "miniarm" hardware) seem to have had the same idea, Tinker uses 8mA as well. @botfap has stated 12 mA was really the only "safe" setting, I'm waiting to get more feedback. TL;DR: Rockchip SoC's appear to have a default setting on SD GPIO's that are generic to all their GPIO's and are no good for 50 MHz + operation across varying SD cards (even decent quality ones), let me know if anything weird is going on. [edit] come to think of it, that would mess with SPI as well at higher speeds (I guess that should be obvious)
botfap Posted September 14, 2018 Posted September 14, 2018 Im not sure that 12mA was the only safe setting, it was the only setting that helped to stop sd corruption when using very low end or fake sd cards. 8mA was rock solid stable when using any normal sd card. Its also probably a better default for Armbian. If the users sd card wont work properly with 8mA of current then it has no place for sbc use I would have thought, the sd card is almost certain to cause other problems later on 1
TonyMac32 Posted September 14, 2018 Posted September 14, 2018 11 hours ago, botfap said: If the users sd card wont work properly with 8mA of current then it has no place for sbc use I would have thought Agreed. I'm just wondering how good the routing on these various boards really is. In any case I've set the rk3328 to 8mA, if anything presents itself on the rk3399 I'd recommend the same.
tkaiser Posted September 14, 2018 Posted September 14, 2018 9 hours ago, TonyMac32 said: Can both of you keep notes on whether you get SD card corruption or failure to boot (with no obvious reason)? Not happened here with Rockchip so far (with EspressoBin I had issues but there I boot now from USB3 and ignore the SD card slot). Another interesting observation: My 2GB NanoPi M4 has higher memory bandwidth and lower memory latency compared to the 4GB version: 2GB kernel 4.4.155 4GB kernel 4.4.155 2GB kernel 4.19.0-rc1 4GB kernel 4.19.0-rc1 BTW: I now also replaced the thermal pad on my 4GB NanoPi M4 with a copper shim + thermal compound and the results are impressive again: At the end of a sbc-bench run max. temperature is now at just 73.3°C (no throttling) compared to reaching 85°C throttling included with thermal pad. Those thermal pads suck. @hjc did you already look into lowering DVFS voltages? In the past we did this using a special Linpack version that is able to spot undervolted DVFS OPP before the board actually crashes.
tkaiser Posted September 14, 2018 Posted September 14, 2018 And two quick storage performance updates: 1) This is the 8GB eMMC FriendlyELEC sells (write performance limited to 43 MB/s as it's mostly the case with low capacity eMMC, read performance exceeding 130 MB/s, nice random IO values -- compare with our overview): random random kB reclen write rewrite read reread read write 102400 4 8565 8793 24853 24808 19463 8523 102400 16 25604 25986 66627 66701 56571 24459 102400 512 42217 42326 125788 126508 125959 40394 102400 1024 42762 43178 129692 129726 130452 42636 102400 16384 43914 42877 132828 133254 133417 43012 2) In the meantime I built also OMV images for NanoPC T4 and NanoPi M4 and did a quick LanTest benchmark with M4 and an externally connected Seagate Barracuda in an USB3 enclosure (not UAS capable -- but this doesn't matter with HDDs) Close to 100 MB/s sequential speed without any more optimizations is simply awesome: (debug output) 1
hjc Posted September 15, 2018 Author Posted September 15, 2018 16 hours ago, tkaiser said: @hjc did you already look into lowering DVFS voltages? In the past we did this using a special Linpack version that is able to spot undervolted DVFS OPP before the board actually crashes. Not yet, recently I'm trying to use M4 (with mainline kernel) as a network router & gateway (connect 2-3 RTL8153, set up VLAN, routing, NAT, and site to site VPN), and still trying to resolve some USB related issues. Currently with mainline kernel the USB hub must be manually reset (USBDEVFS_RESET) after reboot, or it wouldn't be usable. 1
emk2203 Posted September 15, 2018 Posted September 15, 2018 (edited) On 9/13/2018 at 11:05 AM, tkaiser said: The problem is that 'true' USB-C chargers limit current to 500mA or 900mA if the other end of the USB-C cable is not able to negotiate the demand for higher currents. And those 5V SBC with USB-C right now (Vim2 and now also NanoPi M4) don't do this. They act in 'dumb' mode and require the PSU to provide 'current without asking' which standard compliants USB-C chargers will refuse. So you need a dumb or tolerant charger too. I've measured booting current with my dumb PSU and this was reported as above 5.6W which would translate to +1100mA which is well above what a standards compliant USB-C charger is allowed to provide to a dumb device. Yes, I hope this one does well. No chance to test yet since missing the needed adapter (also available in FriendlyELEC's online shop and also a recommendation for those affected then): The official FriendlyELEC adapter is NOT needed if you have another type. It is a BULL GN901-G plug converter, impressively built and worth the money, but you can use any other plug adapter as well. I am currently running the NanoPi M4 with an el cheapo adapter for 0.18€/piece, and it works well.* *) PS: Please take note that these are actually illegal in the EU (and in Japan and most probably US), since they expose the hot plugs if you pull them out halfway. Do not use them with children around. Edited January 13, 2019 by Tido removed pic
emk2203 Posted September 15, 2018 Posted September 15, 2018 On 9/13/2018 at 10:20 AM, tkaiser said: Test methodology is sbc-bench.sh -T 70 ; sbc-bench.sh -t 50 (reasons why). Full results: Thermal pad and copper shim I am quite surprised about your results, my own ones with the standard sudo /bin/bash bin/sbc-bench.sh -c show significant differences to my NanoPi M4 (4GB). Yes, I have noticed the differences in compiler versions etc., but that you get 8.7 kH/s in CPUMiner and I get 10.6 kH/s is quite a difference - more than 22%. My system also has a copper shim (15x15x1.0mm) and reaches only 66.1 °C after cpuminer. The version is also 1.3.3, same as you used.
tkaiser Posted September 15, 2018 Posted September 15, 2018 9 minutes ago, emk2203 said: Yes, I have noticed the differences in compiler versions etc., but that you get 8.7 kH/s in CPUMiner and I get 10.6 kH/s is quite a difference - more than 22% As expected, see the first 3 NanoPC-T4 numbers in my list: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md (it's just GCC 6.3 vs. 7.3 vs. 8.2 that's responsible for this benchmark number variation. Should be a reminder why updating software is a good idea in general -- and why using Debian/Ubuntu or any other 'everything outdated as hell' distro is sometimes so annoying) 12 minutes ago, emk2203 said: My system also has a copper shim (15x15x1.0mm) and reaches only 66.1 °C after cpuminer. The version is also 1.3.3, same as you used The sbc-bench -t and -T modes provide different results (and possible insights) since the purpose of this test is to use -T as first run to heat up the board to +70°C for sure, then with the -t 50 you allow the board to cool down to 50°C which means in this mode the thermal mass of the huge heatsink plays an important role. A normal sbc-bench run starts from low load to high load so the heatsink's temperature is constantly rising but always below the SoC temperature and the final temperatures will be lower compared to the situation when the heatsink is in fact hotter than the board when the 'sbc-bench -t 50' run starts. Would be really interesting if you run with your setup 'sbc-bench.sh -T 70 ; sbc-bench.sh -t 50' as well.
emk2203 Posted September 16, 2018 Posted September 16, 2018 23 hours ago, tkaiser said: The sbc-bench -t and -T modes provide different results (and possible insights) since the purpose of this test is to use -T as first run to heat up the board to +70°C for sure, then with the -t 50 you allow the board to cool down to 50°C which means in this mode the thermal mass of the huge heatsink plays an important role. A normal sbc-bench run starts from low load to high load so the heatsink's temperature is constantly rising but always below the SoC temperature and the final temperatures will be lower compared to the situation when the heatsink is in fact hotter than the board when the 'sbc-bench -t 50' run starts. Would be really interesting if you run with your setup 'sbc-bench.sh -T 70 ; sbc-bench.sh -t 50' as well. Shows that you shouldn't trust unfounded assumptions. I thought gcc variation would account for 3 - 4% variation at most, but not the 22% I see. I tried to run the setup you suggested; my SoC won't heat up beyond 69 °C. The temperature fluctuates between 67 and 69 °C, but never reaches 70 °C. Took it off the table and put it back in the box it came in, but this accounts only for 1 K difference. Your benchmark shows for a moment "Heating SoC from 70 °C to 70 °C", but nothing happens, and then the temperature drops again. So, would like to run the test, but cannot get it to work. I had switched over to the development kernel 4.19rc1, since your table shows 4.17 as kernel version for tests, it should be even better comparable than the 4.4 version from the first test. @Igor @hjc: With the Armbian Bionic image for download, I couldn't get wireless to work. armbian-config accepted my input for wireless up to the password, but after exiting and restarting, the NanoPi M4 didn't show up on the WLAN. I went to the nightly with the dev version of 4.19rc1. Files were needed from the full Armbian package, but even after switching to full firmware, I still get the error about `brcmfmac4356-sdio.txt` missing shown below. `brcmfmac4356-sdio.bin` is in the full firmware image, but was not in the "lite" firmware package which came with the downloaded Bionic image. When I google the error, it seems that I need to copy NVRAM contents from the device to the txt file mentioned in the error message. Any pointer on how to do this is welcome. Worrisome things from `dmesg`, for zswap, I attached statistics also below: zswap: default zpool zbud not available zswap: pool creation failed ... rockchip_mmc_get_phase: invalid clk rate ... dwc3 fe800000.dwc3: Failed to get clk 'ref': -2 dwc3 fe800000.dwc3: failed to initialize core ... brcmfmac mmc0:0001:1: Direct firmware load for brcm/brcmfmac4356-sdio.txt failed with error -2 sudo grep -R . /sys/kernel/debug/zswap [sudo] password for emk2203: /sys/kernel/debug/zswap/same_filled_pages:0 /sys/kernel/debug/zswap/stored_pages:0 /sys/kernel/debug/zswap/pool_total_size:0 /sys/kernel/debug/zswap/duplicate_entry:0 /sys/kernel/debug/zswap/written_back_pages:0 /sys/kernel/debug/zswap/reject_compress_poor:0 /sys/kernel/debug/zswap/reject_kmemcache_fail:0 /sys/kernel/debug/zswap/reject_alloc_fail:0 /sys/kernel/debug/zswap/reject_reclaim_fail:0 /sys/kernel/debug/zswap/pool_limit_hit:0
tkaiser Posted September 16, 2018 Posted September 16, 2018 3 hours ago, emk2203 said: I tried to run the setup you suggested; my SoC won't heat up beyond 69 °C. The temperature fluctuates between 67 and 69 °C, but never reaches 70 °C. Took it off the table and put it back in the box it came in, but this accounts only for 1 K difference Interesting. The thermal compound I used is +30 years old so maybe I should buy a new one (or search for the stuff laying around here anyway somewhere ) and try again.
hjc Posted September 20, 2018 Author Posted September 20, 2018 Used M4 with RTL8153 in routing for several days. I'm having some trouble with 4.19-rc1, when the board is doing NAT between RTL8153 and the internal Ethernet port at about ~600Mbps, the system crashes with an invalid memory page request. After digging I found that this is a kernel software issue, and ayufan's 4.18-rc8 kernel does not have the problem at all. Crash logs: Spoiler kerne[ 274.232128] Modules linked in: xt_nat xt_tcpudp xt_mark xt_TPROXY nf_tp roxy_ipv6 nf_tproxy_ipv4 xt_REDIRECT veth fuse 8021q garp mrp bridge stp llc cdc _ether usbnet r8152 rockchip_rga videobuf2_dma_sg videobuf2_memops snd_soc_rockc hip_spdif snd_soc_rockchip_i2s snd_soc_rockchip_pcm v4l2_mem2mem videobuf2_v4l2 videobuf2_common videodev media nvmem_rockchip_efuse brcmfmac rockchip_io_domain ip6table_filter rockchipdrm cfg80211 ip6_tables rfkill rockchip_thermal brcmuti l dw_hdmi analogix_dp iptable_filter ipt_MASQUERADE drm_kms_helper rockchip_sara dc drm iptable_nat pcie_rockchip_host nf_nat_ipv4 drm_panel_orientation_quirks n f_nat adc_keys input_polldev nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_ mangle tcp_bbr sch_fq ip_tables x_tables ipv6 btrfs libcrc32c xor zstd_decompres s zstd_compress l:[ 274.231695] Internal error: Oops: 96000004 [#1] SMP [ 274.238767] xxhash zlib_deflate raid6_pq phy_rockchip_typec phy_rockchip_pci e rtc_rk808 realtek dwmac_rk stmmac_platform stmmac [ 274.240188] CPU: 2 PID: 21 Comm: ksoftirqd/2 Not tainted 4.19.0-rc1-rk3399 #3 5 [ 274.240822] Hardware name: FriendlyElec NanoPi M4 (DT) [ 274.241276] pstate: 40000005 (nZcv daif -PAN -UAO) [ 274.241708] pc : dev_hard_start_xmit+0x48/0x128 [ 274.242110] lr : dev_hard_start_xmit+0x90/0x128 [ 274.242509] sp : ffff0000092236f0 [ 274.242803] x29: ffff0000092236f0 x28: 0000000000000001 [ 274.243272] x27: dead000000000100 x26: ffff0000092237bc [ 274.243742] x25: ffff000008c76000 x24: ffff80007abc5090 [ 274.244212] x23: ffff000008c7d8d0 x22: ffff000008c7d0c0 [ 274.244681] x21: ffff800075186c00 x20: ffff80007abc5000 [ 274.245150] x19: dead000000000100 x18: ffff80004600a140 [ 274.245620] x17: 0000000000000000 x16: 0000000000000000 [ 274.246089] x15: 0000000000000000 x14: 000000000a18006c [ 274.246558] x13: 000000000000001c x12: 0000000000000018 [ 274.247028] x11: ffff000009223398 x10: 0000000000000004 [ 274.247497] x9 : ffff00000094b8f8 x8 : 0000000000000000 [ 274.247966] x7 : 00000000a0000000 x6 : 000000018e36996c [ 274.248436] x5 : 00ffffffffffffff x4 : 0029df11ad4bd57e [ 274.248905] x3 : 0000000000000018 x2 : c574e174c02a4100 [ 274.249374] x1 : 0000000000000000 x0 : 0000000000000000 [ 274.249845] Process ksoftirqd/2 (pid: 21, stack limit = 0x00000000b03e934a) [ 274.250455] Call trace: [ 274.250676] dev_hard_start_xmit+0x48/0x128 [ 274.251046] __dev_queue_xmit+0x7b0/0x838 [ 274.251402] dev_queue_xmit+0x10/0x18 [ 274.251730] ip_finish_output2+0x248/0x388 [ 274.252094] ip_finish_output+0x114/0x1f8 [ 274.252449] ip_output+0xa8/0x118 [ 274.252744] ip_forward_finish+0x64/0xa0 [ 274.253092] ip_forward+0x38c/0x4c0 [ 274.253401] ip_rcv_finish+0x54/0x88 [ 274.253718] ip_rcv+0x54/0xc0 [ 274.253983] __netif_receive_skb_one_core+0x54/0x80 [ 274.254414] __netif_receive_skb+0x14/0x60 [ 274.254777] netif_receive_skb_internal+0x38/0xe0 [ 274.255194] napi_gro_complete+0x68/0xb8 [ 274.255541] dev_gro_receive+0x660/0x678 [ 274.255889] napi_gro_receive+0x30/0xc8 [ 274.256235] r8152_poll+0x4a8/0x1068 [r8152] [ 274.256614] net_rx_action+0x10c/0x2d0 [ 274.256948] __do_softirq+0x10c/0x200 [ 274.257274] run_ksoftirqd+0x30/0x40 [ 274.257594] smpboot_thread_fn+0x1a0/0x1d0 [ 274.257957] kthread+0x128/0x130 [ 274.258245] ret_from_fork+0x10/0x1c [ 274.258564] Code: 91024038 f90023b9 f0002399 f9002fbc (f940027b) [ 274.259107] ---[ end trace b81d4210d61a59cb ]--- [ 274.259516] Kernel panic - not syncing: Fatal exception in interrupt [ 274.260075] SMP: stopping secondary CPUs [ 274.260546] Kernel Offset: disabled [ 274.260856] CPU features: 0x0,21806008 [ 274.261188] Memory Limit: none [ 274.261462] Rebooting in 10 seconds.. @Igor IMO it might be better move to the same dev kernel as Rockchip64 (I've published the commit here, in case you're willing to merge the changes)
Igor Posted September 20, 2018 Posted September 20, 2018 1 hour ago, hjc said: IMO it might be better move to the same dev kernel as Rockchip64 (I've published the commit here, in case you're willing to merge the changes) OK. Merged manually.
mindee Posted September 23, 2018 Posted September 23, 2018 (edited) Spoiler On 9/13/2018 at 8:36 AM, tkaiser said: I'm now also testing IO performance on NanoPi M4 to get an idea how much influence the integrated VIA VL817 USB3 hub has. The following test command is used since I test with an el cheapo Samsung EVO 840 (known to exceed 500 MB/s on a SATA port but low capacity and implementing TurboWrite) taskset -c 4 iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 sleep 300 taskset -c 4 iozone -e -I -a -s 2000M -r 16384k -i 0 -i 1 -i 2 Why do I test first with 100 MB, then wait 5 minutes and test again with 2 GB? Tests with 100 MB to keep results comparable with my previous collection An additional test with 2 GB since we found that with fast USB3 implementations and fast SSDs the maximum throughput isn't reported when testing with low test sizes. 1GB or better 2GB seem like a reasonable additional test to look for real maximum bandwidth possible Why waiting for 5 minutes (sleep 300) in between? Since my 128GB EVO840 implements TurboWrite and therefore is only capable to write data at maximum speed until the TurboWrite buffer is full. So if I would've tested all block sizes with 1GB test data size the EVO itself would've turned into a bottleneck after approx. 2.5GB data written Why 'taskset -c 4' (executing the test on a big core)? Since we're looking for maximum performance here Raw results as follows random random kB reclen write rewrite read reread read write 102400 4 25559 28658 21964 22012 21957 28183 102400 16 84846 94435 80322 80926 81083 92975 102400 512 325556 326434 289616 292997 294076 322974 102400 1024 347800 362005 324009 327371 329460 352963 102400 16384 381091 386458 370522 373441 373337 378399 2048000 16384 401465 402916 375361 375457 375509 398665 In other words: Even with an USB3 hub in between storage performance is excellent as long as only one disk is connected (with more than one disk and accesses in parallel performance will of course drop since then bus contention issues might occur and bandwidth has to be shared). But with HDDs this is no issue at all since they're too slow anyway. Storage performance summary: I've had some fear that the internal USB3 hub will trash storage performance but that's not true (at least when only one disk is connected). As a reference numbers generated with same SSD, same settings and same test methodology on discontinued ODROID-N1 (as an example for RK3399 USB3 without hub and also with same SSD connected to a cheap PCIe SATA controller): Random IO in IOPS Sequential IO in MB/sec 4K read/write 1M read/write ODROID-N1 5994/6308 330 / 340 NanoPi M4 5489/7045 330 / 355 ASM1061 powersave 6010/7900 320 / 325 ASM1061 performance 9820/16230 330 / 330 The ASM1061 numbers are especially dedicated to all those SBC users believing into 'native SATA'. Sequential storage performance with a good UAS capable USB3-to-SATA bridge is higher compared to cheap PCIe SATA controllers. 'Native' SATA does not exist with RK3399 anyway and for really high SATA storage performance more expensive SATA controllers making use of more than 1 PCIe lane are necessary. But with just an ASM1061 the only two real benefits of PCIe attached SATA over USB3 attached SATA are higher random IO performance less IRQs to process which results in less CPU utilization (see ODROID-N1 link above) But to attach one or two USB3 HDDs and share them at maximum speed (~100 MB/s through network) the NanoPi M4 is perfectly fine. Edit: Debug output here. We can see the usual 'ERROR Transfer event for disabled endpoint or incorrect stream ring' XHCI error when connecting an USB3 disk enclosure the first time (known problem but doesn't cause any harm) and amount of DRAM is reported differently since I swapped the SD card between my two NanoPi M4 Thanks for your suggestion, we made a SATA HAT prototype for NanoPi M4, it can connect with 4x 3.5inch hard drive and work well. Edited January 13, 2019 by Tido added spoiler 13
tkaiser Posted September 23, 2018 Posted September 23, 2018 10 hours ago, mindee said: Thanks for your suggestion, we made a SATA HAT prototype for NanoPi M4, it can connect with 4x 3.5inch hard drive and work well Really looking forward to this HAT BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB? 2
mindee Posted September 23, 2018 Posted September 23, 2018 21 minutes ago, tkaiser said: Really looking forward to this HAT BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB? 21 minutes ago, tkaiser said: Really looking forward to this HAT BTW: I've not the slightest idea how much efforts and initial development costs are needed for such a HAT. But in case the Marvell based 4-port SATA solution will be somewhat expensive maybe designing another one with an ASMedia ASM1062 (PCIe x2 and 2 x SATA) and just one Molex for 2 drives might be an idea. Could be one design made for NEO4 that will work on M4 too with a larger (or dual-purpose) HAT PCB? I think the manufacturing cost is not so high, I am not sure for that now. The NEO4 use a differencet PCIe connector, it’s not a easy thing to fit both with one HAT, considering the signal quality. 1
Jason Law Posted September 29, 2018 Posted September 29, 2018 Any idea when this HAT will be released? Price point? Will there be a Male 40 pin connector on it? Will this HAT be usable with an eMMC module in the eMMC slot right underneath it? This HAT is really the number 1 selling point for me right now. Looks way more usable than the Rockpro64 SATA setup. On a different note regarding the heat sink, are you better off with a fan blowing from the top down? Or from the side for heat dissipation?
Teal Posted October 3, 2018 Posted October 3, 2018 This hat really would be the number one selling point! If it would be a Marvell based 4-port SATA - and omv compatible like this one for example - this will be bought without a second thought. Speed for spinning rust is somewhat irrelevant, as probably nobody will connect SSDs to this anyway. As I am new to sbc, is there any way to get native PCIe x2 ports from the 24Pin Extension port? Maybe some hardware riser board / extension or similar? And will these work without armbian out of box? Or are any kernel driver related issues to expect?
Recommended Posts