6 6
mindee

NanoPI M4

Recommended Posts

Before running tinymembench:

hjc@nanopim4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc$ cat trans_stat 
   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       5    117427
 300000000:       3       0       0       0       0       1      1157
 400000000:       0       0       0       0       0       0         0
 528000000:       0       1       0       0       0       0       200
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       1       0       0       446
Total transition : 16

 

When testing memory bandwidth:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
 200000000:       0       0       0       0       0       6    142405
 300000000:       3       0       0       0       0       1      1157
 400000000:       0       0       0       0       0       0         0
 528000000:       0       1       0       0       0       0       200
 600000000:       0       0       0       0       0       0         0
*800000000:       2       3       0       1       0       0     13315
Total transition : 17

 

When testing latency:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       6    310967
 300000000:       4       0       0       0       0       1      1257
 400000000:       0       1       0       0       0       0     17200
 528000000:       0       1       1       0       0       0       300
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       2       0       0    177565
Total transition : 21

(and tinymembench shows high latency)

 

After:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       6   1029003
 300000000:       4       0       0       0       0       1      1257
 400000000:       0       1       0       0       0       0     17200
 528000000:       0       1       1       0       0       0       300
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       2       0       0    177565
Total transition : 21

 

1 hour ago, tkaiser said:

echo performance >governor

After setting this

 

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   2931.5 MB/s (4.2%)
 C copy backwards (32 byte blocks)                    :   2926.2 MB/s
 C copy backwards (64 byte blocks)                    :   2874.3 MB/s
 C copy                                               :   2903.6 MB/s
 C copy prefetched (32 bytes step)                    :   2866.9 MB/s
 C copy prefetched (64 bytes step)                    :   2863.8 MB/s
 C 2-pass copy                                        :   2583.9 MB/s
 C 2-pass copy prefetched (32 bytes step)             :   2640.9 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   2635.6 MB/s
 C fill                                               :   4892.7 MB/s (0.5%)
 C fill (shuffle within 16 byte blocks)               :   4894.2 MB/s (0.1%)
 C fill (shuffle within 32 byte blocks)               :   4889.4 MB/s (0.4%)
 C fill (shuffle within 64 byte blocks)               :   4894.0 MB/s (0.2%)
 ---
 standard memcpy                                      :   2934.9 MB/s
 standard memset                                      :   4893.5 MB/s (0.3%)
 ---
 NEON LDP/STP copy                                    :   2927.2 MB/s
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   2958.8 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   2960.9 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   2864.1 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   2861.6 MB/s
 NEON LD1/ST1 copy                                    :   2925.8 MB/s
 NEON STP fill                                        :   4892.3 MB/s (0.4%)
 NEON STNP fill                                       :   4859.3 MB/s (0.1%)
 ARM LDP/STP copy                                     :   2925.9 MB/s
 ARM STP fill                                         :   4892.6 MB/s (0.4%)
 ARM STNP fill                                        :   4854.5 MB/s (0.4%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :    668.8 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :    598.6 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :    711.0 MB/s
 NEON LD1/ST1 2-pass copy (from framebuffer)          :    649.0 MB/s
 ARM LDP/STP copy (from framebuffer)                  :    483.6 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :    467.4 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.1 ns          /     6.5 ns 
    131072 :    6.2 ns          /     8.7 ns 
    262144 :    8.9 ns          /    11.6 ns 
    524288 :   10.3 ns          /    13.3 ns 
   1048576 :   15.1 ns          /    21.4 ns 
   2097152 :  105.6 ns          /   159.7 ns 
   4194304 :  150.2 ns          /   199.5 ns 
   8388608 :  177.2 ns          /   219.2 ns 
  16777216 :  190.9 ns          /   227.3 ns 
  33554432 :  197.7 ns          /   232.0 ns 
  67108864 :  208.3 ns          /   245.0 ns 

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.1 ns          /     6.5 ns 
    131072 :    6.1 ns          /     8.7 ns 
    262144 :    7.2 ns          /     9.5 ns 
    524288 :    7.7 ns          /     9.9 ns 
   1048576 :   12.0 ns          /    16.9 ns 
   2097152 :  104.2 ns          /   156.9 ns 
   4194304 :  148.3 ns          /   195.3 ns 
   8388608 :  169.6 ns          /   207.0 ns 
  16777216 :  180.1 ns          /   210.5 ns 
  33554432 :  185.5 ns          /   212.5 ns 
  67108864 :  188.3 ns          /   213.7 ns 

That's the expected performance.

 

armbianmonitor -u: http://ix.io/1lAb

Share this post


Link to post
Share on other sites
3 minutes ago, tkaiser said:

Did you run into other issues with M4 so far?

For 4.4 kernel, there's no other issues. Everything else works fine (at least as a headless server. Haven't tried to connect a monitor yet). I may try the Bionic desktop image this weekend.

As for mainline, WiFi and USB 3.0 does not work. lsusb only shows the otg 2.0 root hub.

 

PCIe and MIPI are not tested yet.

 

1 minute ago, tkaiser said:

Different DRAM type seems to be no problem yet, right?

Yes, AFAIK Rockchip's binary blobs can detect the DDR type and initialize them accordingly. I can see different DRAM initialize log output on T4 and M4, although they use the same rk3399_ddr_800MHz_v1.14.bin file and exactly the same u-boot config.

Share this post


Link to post
Share on other sites

Is the memory bus the same width for the 2GB and 4GB versions?

 

I just noticed in the pictures that the 4GB version has two DRAM chips (both on top), but the 2GB version has 4 (2 on top, 2 on bottom).

 

I just got my 4GB board in, so I can run tests if that's helpful.  I'm just starting to set it up now.

 

Dan

 

Share this post


Link to post
Share on other sites
1 hour ago, Dan Christian said:

Is the memory bus the same width for the 2GB and 4GB versions?

 

Yep. Check the link to the review and compare @hjc's tinymembench numbers from his 2GB M4 (4 x DDR3) with e.g. RockPro64 (2 x LPDDR4). It's both times dual-channel DRAM but it could be possible that we get more recent DRAM initialization BLOBs from Rockchip and then RockPro64 with LPDDR4 might be slightly faster (I don't think this will change anything with the larger 4GB M4 configuration using LPDDR3).

 

BTW: For most use cases memory bandwidth is pretty much irrelevant.

Share this post


Link to post
Share on other sites

Hi all.
I got my M4 today.
The Lubuntu from FriendlyElec had problems.
I quickly gave up on it and installed Armbian. Haven't had any problems. Great job with that!

I'm impressed by the board. The heatsink is awesome. Great it's in this formfactor.
I've raised the board and put a small 5V fan underneath. It's steady at 60°C maxed out.

Without the fan it throttles after a while. But not too bad.
It's a good match to my octa-core SBC's in Blender. The T3+ and XU4 are a bit faster.


But most important for me is Kdenlive. There it's by far the fastest of all my SBC's.  No idea why it handles it so well.
M4 34m51s
XU4 46m23s
T3+ 1h15m
Vim2 1h44m
It always are strange results with that. Finally I've found a good render machine to take with me.

Youtube playback is also good. 1080p in Chromium is without stuttering. 1/4 lost frames. No other SBC does it so well without hwaccel.


I'll do a lot more research on it and then make a video. If anyone has more info or suggestions please let me know. Anyone playing linux games on it? (I know, probably wrong forum for that)

@hjc Thank you for the great info. Did you test the voltage drop on the USB? I've seen dips of 0.4V with sudden bursts without a load on the usb. I wonder if it gets worse with a load.
The psu they gave with it is very stable. 
Thanks for the great work.
Cheers

Share this post


Link to post
Share on other sites
On 9/27/2018 at 10:08 PM, NicoD said:

Did you test the voltage drop on the USB? I've seen dips of 0.4V with sudden bursts without a load on the usb. I wonder if it gets worse with a load.

 

@tkaiser

Did you notice this? I've just done a test with a 1A load on 1 usb port. Then let the cpu run at 100% and it went under 4V. Is there a command to display usb voltage in Linux?

Cheers.

Share this post


Link to post
Share on other sites
9 minutes ago, NicoD said:

Is there a command to display usb voltage in Linux?

No. You need circuitry to measure this in the first place (only some old and boring Allwinner A20 devices provide this and the infamous Raspberry Pi has some circuitry to detect when input voltage drops below 4.63V +/- 10%)), then a driver interpreting the ADC data appropriately. If those two prerequisits are met the tool of choice to display this as a voltage value is close to irrelevant. But that's not the problem. The hardware is simply missing on majority of boards.

Share this post


Link to post
Share on other sites

Hello, 

 

Actually, I own a single Raspberry Pi 3. I use it mainly for full features web/database/file sharing server, for development projects... However, I plan to upgrade my equipment with a cluster of PIs! Today, I just discovered the M4, and it really interest me... So, in order to reduce the cost of my build, I want to know if the M4 with Armbian support eMMC module. I think yes, but I really prefer to ask before buying anything... I really want to use Armbian for its flexibility and awesome community! 

 

For more information, my cluster would be made of 1 M4 4gb RAM and 3 M4 2gb RAM, just in order to make something rock solid :D !

 

Thanks in advance for your replies! 

Share this post


Link to post
Share on other sites
3 minutes ago, tkaiser said:

 

Sure. Just read the review.

Thank you very much! Of course, I read that doc, but did not pay attention to this specific part ;) ! So I will really consider buying this SoC... A last thing... Does Armbian needs to be flashed from a SD card, like all other boards, or is it possible to flash Armbian directly to the eMMC chip through the micro SD adapter available on this board ? :D

Share this post


Link to post
Share on other sites
21 minutes ago, RoroTiti said:

Does Armbian needs to be flashed from a SD card, like all other boards

 

'Like all the other boards' is not true since Allwinner boards for example can be flashed using FEL mode (directly accessing the board's eMMC as USB storage).

 

If you have an eMMC to SD card adapter you can burn an Armbian image directly to eMMC, if not it's just calling nand-sata-install and you're done.

Share this post


Link to post
Share on other sites

Hi guys.
Some results of testing the undervoltage. It does seem pretty bad. Best not to power devices via the usb/5V gpio on the M4.
I've done my tests with my Odroid C2 as load. It's the most stable with bad voltages, but once under 4V it cuts down. I've ordered load resistors to do better testing.

Undervoltage
----------------
Lubuntu 32-bit
No load on USB - no load on CPU    = 5.07V
                                 max load          = 4.92V
1A load on USB - no load on CPU    = 4.55V
                                max load          = average 4.33V
                                                           lowest 4.26V
1.5A load on USB no load on cpu    = 4.10V
                             max load cpu      = under 4V My devices stop working and cut the power

Powerbanks only charge at 0.2A - 0.4A
Armbian    @ 2Ghz/1.5Ghz is worse. A load of 1A and max cpu load and it can dip under 4V
Using multiple USB ports has the same result

So best not to power hard drives, charge smart phones, power other sbc's... with it.

Other than that it's an amazing board.
Cheers.

Share this post


Link to post
Share on other sites
48 minutes ago, NicoD said:

1.5A load on USB no load on cpu    = 4.10V
                             max load cpu      = under 4V My devices stop working and cut the power

This may explain why I could not use multiple RTL8153 on the board stably.

 

Share this post


Link to post
Share on other sites
1 minute ago, hjc said:

This may explain why I could not use multiple RTL8153 on the board stably.

 

They don't use too much power, but they are sensitive to low voltage.
Please check your usb voltage if you can so I can confirm this is not only on my board.
Greetings.

Share this post


Link to post
Share on other sites
5 minutes ago, NicoD said:

Please check your usb voltage if you can so I can confirm this is not only on my board.

Unfortunately I'm on vacation and don't have physical access to my M4 right now. Will check that once I return home.

Share this post


Link to post
Share on other sites
On 10/3/2018 at 2:43 PM, NicoD said:

Hi guys.
Some results of testing the undervoltage. It does seem pretty bad. Best not to power devices via the usb/5V gpio on the M4.
I've done my tests with my Odroid C2 as load. It's the most stable with bad voltages, but once under 4V it cuts down. I've ordered load resistors to do better testing.

Undervoltage
----------------
Lubuntu 32-bit
No load on USB - no load on CPU    = 5.07V
                                 max load          = 4.92V
1A load on USB - no load on CPU    = 4.55V
                                max load          = average 4.33V
                                                           lowest 4.26V
1.5A load on USB no load on cpu    = 4.10V
                             max load cpu      = under 4V My devices stop working and cut the power

Powerbanks only charge at 0.2A - 0.4A
Armbian    @ 2Ghz/1.5Ghz is worse. A load of 1A and max cpu load and it can dip under 4V
Using multiple USB ports has the same result

So best not to power hard drives, charge smart phones, power other sbc's... with it.

Other than that it's an amazing board.
Cheers.

 

The thing is: The charger that sells FriendlyElec with the board is a copy of the Oneplus Dash charger and maybe for this charger to deliver all the power (nearly 4 amps) needs to have some negotiation between the load and the psu. As far as I know, the oneplus devices dont enter into this mode until something happens in the phone like 1 second after plugging in.

 

I don't think the charger is delivering this amount of amps but maybe 2'5.

 

I'll have to test it one of these days.

 

 

EDIT: Confirmed, the Dash charger from oneplus complies with a protocol for the  4 amps to be delivered.

 

This quote is extracted from the post from xda: https://forum.xda-developers.com/oneplus-3/accessories/dash-charge-protocol-analysis-t3431917/post68125991

 

"The charger waits for a current draw (i didn't bother testing it's thresholds) and then reads/checks-for the E²PROM embedded in the plug of the cable (presumably containing an authentication for dash).
At a similar time the phone sends a kind of "preamble" consisting of high-low transitions of varying lengths (but this doesn't seem to important to the charger, it tries to "dash" even without it).
Afterwards the phone and charger start exchanging 9 bits of data in bursts. One USB-Data-Line is clock, the other one is data. The chargers supplies the clock and the first 9 bits and after a short pause the phone gets to reply with another 9 bits as the charger supplies another "clock-burst".

At the beginning (before dash-charging is in effect), the charger sends 150h and the phone replys with 158h.
Once the phone is ready to begin dash-charging it replys with 178h instead. 
The charger then configures it's Step-Down Converter for ~4.5V of output voltage and then sends 148h to the phone. 
It either replies with 170h if the voltage is too high or 178h if the voltage is correct (I didn't see a reply for "too low", but it might exist). 
If the phone replies 170h the charger lowers the voltage by about 100mV-200mV and "asks" again. 
Once the phone replies with 178h the charger stops lowering the voltage and sends 14Ch to which the phone replies with 141h. 
During the dash-charging process the charger periodically sends 144h to which the phone replies with a number which seems to roughly coincide with the state of charge (i have seen values from 16Eh to 178h). 
The charger seems to nudge-up the voltage every once in a while (presumably when the current dropped below a threshold). 

If the battery is relatively full (i tested at 90% charge) the "dash-charging-cycle" doesn't even start and the communication stays at an exchange of 150h/158h data "words".

I did some minor probing on the wall-wart with a stripped USB 3.0 extension and found that it uses the same commands but with the lowest bit set (i.e. it adds 1 to the command codes). Unfortunately the USB 3.0 cable had some internal resistance ruining most of the analog measurements. "

Share this post


Link to post
Share on other sites
On 10/3/2018 at 9:39 PM, NicoD said:

Please check your usb voltage if you can so I can confirm this is not only on my board.

I can confirm that my M4 also has the voltage drop issue.

  • 1 RTL8153 connected, system idle: 4.9V
  • Running iperf3 and generate 2Gbps traffic: 4.7V
  • Running iperf3 with 6x cpuburn: 4.5V

On NanoPC T4 the voltage is always 5.0V no matter what workload I run.

Share this post


Link to post
Share on other sites
34 minutes ago, hjc said:

I can confirm that my M4 also has the voltage drop issue.

  • 1 RTL8153 connected, system idle: 4.9V
  • Running iperf3 and generate 2Gbps traffic: 4.7V
  • Running iperf3 with 6x cpuburn: 4.5V

On NanoPC T4 the voltage is always 5.0V no matter what workload I run.

Thank you. That's very helpful. Do you still use another PSU?
 

 

7 hours ago, CabröX said:

The thing is: The charger that sells FriendlyElec with the board is a copy of the Oneplus Dash charger and maybe for this charger to deliver all the power (nearly 4 amps) needs to have some negotiation between the load and the psu. As far as I know, the oneplus devices dont enter into this mode until something happens in the phone like 1 second after plugging in.

A very nice explenation. It is possible. But I rather think the resistance in the board is too big for a big current to go thru. The PSU is very stable.
The higher the current the lower the voltage, looks like resistance to me. It's a small board at 5V, without voltage protection. I didn't expect it to handle a lot of current.
I can be wrong...

Share this post


Link to post
Share on other sites

Hi all.
The undervoltage problem was a bad cable. It's a lot more stable with the original cable.
I've finished my review video about the NanoPi M4.
Again with a great working Armbian. Thank you to everyone who worked on Armbian. Great job.
Greetings.
NicoD

 

Share this post


Link to post
Share on other sites
5 hours ago, NicoD said:

The undervoltage problem was a bad cable. It's a lot more stable with the original cable.

So it's not a defect of the board, right? That's good news. I'd get some shorter cable w/ lower resistance and try to test 2*RTL8153 again.

Share this post


Link to post
Share on other sites

 

3 hours ago, fossxplorer said:

@nicoD

Fantastic video you made for this board. I really really want to order one after watching your video!

Thanks a lot!

See if this is the right form factor for your use. Maybe you could benifit on a m.2 slot, PCIe slot. Then you can go for another RK3399 board. NanoPC-T4, RockPro64, Orange Pi 3399, ...
This is for me the greatest sbc ever. I can't find many faults in it. I think it's going to get a lot of support since many will RK3399 boards will be sold.
I need a 5V board that I can power with my power banks(max 2.4A). I want it fast, and I need to be able to cool it. So all my boxes are crossed here.
You will use it for something else probably, so see if it's the right choice for you. All the RK3399 boards perform the same.
Greetings

Share this post


Link to post
Share on other sites

Thanks for your advises @NicoD

I have several use cases, which i've bought a couple of HP thin clients to see if they can cover my needs. This because i really got fed up with the I/O options of the boxes i've played with.

 

Anyway, my uses cases are different, one of them being NAS, and i've seen a 4 port SATA NAS HAT by @mindee somewhere here i look forward to being tested/reviewed. Will that HAT work with this board?

And based on your video review, i really like this board as i probably don't need to deal with heat dissipation due to the nice heatsink.  Being able to feed it with current from power bank is a plus for me as i'm gonne bring this "computer" to south asia where power comes and goes :)

 

 

 

 

 

 

 

Share this post


Link to post
Share on other sites
1 hour ago, fossxplorer said:

Anyway, my uses cases are different, one of them being NAS, and i've seen a 4 port SATA NAS HAT by @mindee somewhere here i look forward to being tested/reviewed. Will that HAT work with this board? 

That hat is specially for this board. I'll order it when it's out.
For NAS this board is perfect. With an external hd with usb3 or with that sata hat.

Share this post


Link to post
Share on other sites
15 hours ago, NicoD said:

That hat is specially for this board. I'll order it when it's out.
For NAS this board is perfect. With an external hd with usb3 or with that sata hat.

 

Any idea when this HAT will be out? I plan to get one as my NAS board.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
6 6