Jump to content
  • 0

NanoPI M4


mindee
 Share

Question

2 hours ago, lucho said:

I think that the hardware development is already complete, unlike the software development, which has only recently started and is progressing quickly (a new pre-release almost every day).

 

As to the hardware revisions, the difference between V1.0 and V2.0 is quite large. The difference between V2.0 and the July 2018 batch is only 1 omitted resistor (R895381), which was probably not needed in the first place. So the bare PCB can be the same as for V2.0. In general, the number of hardware revisions of a product is much lower than the number of software revisions. This is natural.

 

But what's interesting to me is how the price of RockPro64, Orange Pi RK3399, and other RK3399-based boards like the NanoPC T4 can be so low. As many of you probably don't know, the Chinese government subsidises Shenzhen Xunlong Software Company (the maker of the Orange Pi boards), and so they can afford to sell their boards at the BOM (Bill of Materials) cost, not following the "golden rule" that the retail price of a product must be 3 times higher than the BOM cost to pay salaries, depreciation allowances, etc. But their RK3399 board has only 2 GB of DRAM and 16 GB of eMMC flash memory and sells for $109. The 2 GB RockPro64 variant plus 16 GB eMMC sells for even less: just $75. How is this possible? Does the Chinese government subsidise Pine Microsystems as well? Unlikely! Then, what is the secret behind their astonishingly low prices?

 

NanoPi-M4 will be cheaper and same form factor as RPI 3. :)

Link to comment
Share on other sites

Recommended Posts

  • 0
1 hour ago, TonyMac32 said:

My point was, Aluminum is aluminum forever, unless some of the neutrons decay

Aluminum oxidizes to Al2O3 immediately..  What you see as 'alu-look' is in fact oxidized aluminum...  Steel is efficient to reuse whereas aluminum isn't (it's still a way more efficient than start from bauxite). 

 

1 hour ago, TonyMac32 said:

Plastics are quite cheap to make, and relatively low energy.  It's the disposal and reuse that becomes the problem.

hmm not really..  a heretic would call HDPE as 'highly viscous diesel' :lol: Okay.. diesel isn't trendy anymore but burn it,  cook water with it and gain energy out of it. Some plastics can be reused (e.g. I don't think that any of my cheap PETg filament is 'new').  America has a 'famous' history of combine plastics with benzene and gasoline, related to increase the viscosity... 

I prefer mixed cases when the SoC is on the bottom side. I would love to see that boardmakers release CAD-files of the top so that if you need things like a pinheader, you just have to adjust it slightly and print it on your own.. 3D-printers or printservices are cheap as hell as long as you don't need fancy plastics (the reason I print with PETg is cause it's the only one which is at least 'a little bit' chemical stable for the my use-cases and cheap enough, the one I would prefer is ~900$/kg and needs 350-400°C print temperature.. Not that handy for a 300$ printer :lol:). 

 

Link to comment
Share on other sites

Check forum guidelines to use maximum potential!

  • 0

Is there is an aproximate release date for this board? I would totally pick one, as I was having a look to the nanopc-T4 but I think is too much of an overkill for my projects and would like to get some raspberry pi cases too for this one!

 

Thanks in advance.

Edited by CabröX
Link to comment
Share on other sites

  • 0

Waw looks amazing, definitely going to wait for it. The firsts versions of this SBC's usually come with hardware problems?

 

And another question, is the back part (aluminium case) like a heat sink for the SoC? And will the final version come with it? Seems so nice.

Link to comment
Share on other sites

  • 0
3 hours ago, mindee said:

Working on NanoPi M4 these days,  almost done

 

Really curious about what you did with PCIe here, how many times SuperSpeed is available at USB receptacles and so on. Can't wait for the wiki page to appear :) 

Link to comment
Share on other sites

  • 0
4 hours ago, mindee said:

B25D763E-E75F-4920-A6E2-3467E0875DCA.jpeg

 

Thank you for updating the post with this second picture!

 

So I would assume we have either:

  • one USB3 port directly connected to RK3399 and three USB3 ports behind the internal VL817 hub sharing bandwidth or
  • all 4 USB3 ports behind the internal VL817 hub sharing bandwidth (then OTG is routed to USB-C but only as Hi-Speed?)

Same Wi-Fi chip as NanoPC-T4, optional eMMC, 2 PCIe lanes available on a header... nice! I really hope NanoPi M4 and NanoPC-T4 will be as compatible as possible so we can support them with a single image :)

Link to comment
Share on other sites

  • 0
On 7/13/2018 at 4:37 PM, tkaiser said:

I really hope NanoPi M4 and NanoPC-T4 will be as compatible as possible so we can support them with a single image :)

It seems that the DT files of NanoPi M4 and NanoPi NEO4 are already published on FriendlyARM GitHub. They share most parts (rk3399-nanopi4-common.dtsi) with NanoPC T4, with minor differences.

Link to comment
Share on other sites

  • 0
3 hours ago, frottier said:


Ok, so it's confirmed:

On 7/13/2018 at 10:37 AM, tkaiser said:

all 4 USB3 ports behind the internal VL817 hub sharing bandwidth (then OTG is routed to USB-C but only as Hi-Speed?)

 

Powering also possible through pins 2 and 4 so since the SoC is at the right side and heat dissipation is no problem @mindee could evaluate a 'SATA HAT' using the 2 PCIe lanes, a Marvell 88SE9235 SATA controller (x2 PCIe to host, 4 x SATA 3.0 to disks) and power circuitry with 12V input able to feed board and 4 x 3.5" disks.

 

If I understood correctly RK3399's VPU capabilities make it interesting as transcoding NAS (once video support is ready in Linux, though no idea how far things are. @JMCC do you know about the state of video transcoding with RK3399?)

Link to comment
Share on other sites

  • 0
On 8/21/2018 at 4:23 PM, tkaiser said:

do you know about the state of video transcoding with RK3399?

Well, FFmpeg's RKMPP support includes only decoding so far, so not many chances that Plex/Emby will support accelerated encoding for RK3399 anytime soon. Gstreamer should be pretty well supported, in theory just as RK3288, but I haven't tested it. Now I'm returning to SBC's after some weeks, so I'll make sure to test video encoding when I do the RK3399 media script, God willing.

Link to comment
Share on other sites

  • 0

I've ordered one just now.

 

BTW is there any recommended enclosure for both M4 and NanoPC T4? I'm not to expose these expensive boards in the air any more. Air quality around here is really poor, (for my Firefly RK3399) half year's exposure = lots of dust gathering on the PCB :( 

Link to comment
Share on other sites

  • 0
6 minutes ago, hjc said:

I've ordered one just now

 

I hope you added the heatsink?

 

My take on 'enclosure' is all those boards landing in a drawer. Since most recent boards generate more and more heat I'm thinking about adding 2 large and silent fans + dust filters in a similar way as shown here: https://forum.openmediavault.org/index.php/Thread/18962

Link to comment
Share on other sites

  • 0
Just now, tkaiser said:

I hope you added the heatsink?

Yes I did, of course. Heatsinks are essential for RK3399 boards.

 

12 minutes ago, tkaiser said:

My take on 'enclosure' is all those boards landing in a drawer. Since most recent boards generate more and more heat I'm thinking about adding 2 large and silent fans + dust filters in a similar way as shown here: https://forum.openmediavault.org/index.php/Thread/18962

It's a nice approach, I'll consider that. Although modifying the drawer seems to take a lot of time.

Link to comment
Share on other sites

  • 0

Looks like my M4 will arrive next Thursday. One thing that I still concern about is that, I ordered the 2GB RAM model, which uses DDR3 instead of LPDDR3 that the 4GB ones use, so I doubt if there's any RAM initialization differences to take care about when creating images for the board. At least in u-boot, LPDDR3 and DDR3 are using different timing parameters, specified in device tree.

Link to comment
Share on other sites

  • 0
On 8/21/2018 at 10:23 PM, tkaiser said:


Ok, so it's confirmed:

 

Powering also possible through pins 2 and 4 so since the SoC is at the right side and heat dissipation is no problem @mindee could evaluate a 'SATA HAT' using the 2 PCIe lanes, a Marvell 88SE9235 SATA controller (x2 PCIe to host, 4 x SATA 3.0 to disks) and power circuitry with 12V input able to feed board and 4 x 3.5" disks.

 

If I understood correctly RK3399's VPU capabilities make it interesting as transcoding NAS (once video support is ready in Linux, though no idea how far things are. @JMCC do you know about the state of video transcoding with RK3399?)

 

Thanks for your suggestion, we’ll check that.

Link to comment
Share on other sites

  • 0
4 minutes ago, hjc said:

M4 (4.4 armbian nightly kernel) w/ the official huge heatsink attached: http://ix.io/1lvP

 

Hmm... that's a bit underwhelming. I had hoped for better results (even taking into account the 29°C ambient temperature). Do FE guys again use the blue thermal pad that is 1mm thick?

 

And as already reported: throttling needs some tuning since we currently jump between 2.0 and 1.4 GHz and skip the OPP in between:

1992 MHz: 3949.34 sec
1800 MHz:       0 sec
1608 MHz:       0 sec
1416 MHz:   35.42 sec
1200 MHz:       0 sec

Had no time to look into yet (too busy with a boring VoIP project these days)

Link to comment
Share on other sites

  • 0
1 minute ago, hjc said:

same as the one that came with NanoPC T4.

 

Good. Already prepared a bunch of copper shims of varying height and thermal paste since I would believe a lot of the poor thermal performance is due to heat not efficiently transferred into the heatsink. Curious when my M4 will arrive...

Link to comment
Share on other sites

  • 0
22 hours ago, hjc said:

M4 (4.4 armbian nightly kernel) w/ the official huge heatsink attached: http://ix.io/1lvP

There's something wrong (DRAM related) when running Rockchip 4.4 kernel, which causes the latency to be twice as much as other RK3399 boards. This causes very poor 7-zip performance, and it takes a long time to run the tinymembench. (~20 minutes both on big and little cores)

However the DRAM is performing normally on mainline kernel (4.19-rc1), and the benchmark numbers are identical to other boards.

 

Mainline kernel benchmark details: http://ix.io/1lzx. I didn't modify the opp table and thermal trip point, and it's limited to 70℃ and 1.8/1.4GHz, so thermal throttling occurs very frequently. Though it's still very powerful running under 1.6/1.4GHz and keeps cool.

 

Edit: Re-run with opp/trip point modified: http://ix.io/1lzP

Link to comment
Share on other sites

  • 0
30 minutes ago, hjc said:

There's something wrong (DRAM related) when running Rockchip 4.4 kernel, which causes the latency to be twice as much as other RK3399 boards. This causes very poor 7-zip performance, and it takes a long time to run the tinymembench. (~20 minutes both on big and little cores)

 

Yes, you're right. By looking at dmc stats we might get the culprit:

root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# cat trans_stat 
   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0      14 224442114
 300000000:       8       0       0       0       0       0      2062
 400000000:       0       6       0       0       0       1     11499
 528000000:       1       1       5       0       0       0     25918
 600000000:       0       0       0       0       0       0         0
 800000000:       5       1       2       7       0       0    136407
Total transition : 51

This is NanoPC T4 running tinymembench:

root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# cat cur_freq 
200000000

Now repeating the test after:

root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# echo performance >governor 
root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# cat cur_freq 
800000000

 

Link to comment
Share on other sites

  • 0
55 minutes ago, tkaiser said:

Now repeating the test after:


root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# echo performance >governor 
root@nanopct4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc# cat cur_freq 
800000000

 

Results: http://ix.io/1lzW -- comparing with my previous run without modifying dmc policy: http://ix.io/1lkG

 

So looking at dmc memory governor with dmc_ondemand we have 5870 7-zip MIPS and with performance it's above 6500 (I need to retest with fan). Individual results:

A53            dmc_ondemand    performance
        memset     1417.3        1413.7
        memcpy     4784.5        4786.8
single latency     381.0         207.5
  dual latency     451.9         240.6
  7-zip single      837           1037

A72            dmc_ondemand    performance
        memset     2809.3        2821.2
        memcpy     4893.0        4895.7
single latency     381.7         217.8
  dual latency     483.8         260.5
  7-zip single      1336          1712

While memory bandwidth doesn't differ between both governors latency is highly affected. Same with single threaded 7-zip runs and also multi-threaded to a lesser extent.

Link to comment
Share on other sites

  • 0

Before running tinymembench:

hjc@nanopim4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc$ cat trans_stat 
   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       5    117427
 300000000:       3       0       0       0       0       1      1157
 400000000:       0       0       0       0       0       0         0
 528000000:       0       1       0       0       0       0       200
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       1       0       0       446
Total transition : 16

 

When testing memory bandwidth:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
 200000000:       0       0       0       0       0       6    142405
 300000000:       3       0       0       0       0       1      1157
 400000000:       0       0       0       0       0       0         0
 528000000:       0       1       0       0       0       0       200
 600000000:       0       0       0       0       0       0         0
*800000000:       2       3       0       1       0       0     13315
Total transition : 17

 

When testing latency:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       6    310967
 300000000:       4       0       0       0       0       1      1257
 400000000:       0       1       0       0       0       0     17200
 528000000:       0       1       1       0       0       0       300
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       2       0       0    177565
Total transition : 21

(and tinymembench shows high latency)

 

After:

   From  :   To
         :200000000300000000400000000528000000600000000800000000   time(ms)
*200000000:       0       0       0       0       0       6   1029003
 300000000:       4       0       0       0       0       1      1257
 400000000:       0       1       0       0       0       0     17200
 528000000:       0       1       1       0       0       0       300
 600000000:       0       0       0       0       0       0         0
 800000000:       2       3       0       2       0       0    177565
Total transition : 21

 

1 hour ago, tkaiser said:

echo performance >governor

After setting this

 

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   2931.5 MB/s (4.2%)
 C copy backwards (32 byte blocks)                    :   2926.2 MB/s
 C copy backwards (64 byte blocks)                    :   2874.3 MB/s
 C copy                                               :   2903.6 MB/s
 C copy prefetched (32 bytes step)                    :   2866.9 MB/s
 C copy prefetched (64 bytes step)                    :   2863.8 MB/s
 C 2-pass copy                                        :   2583.9 MB/s
 C 2-pass copy prefetched (32 bytes step)             :   2640.9 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   2635.6 MB/s
 C fill                                               :   4892.7 MB/s (0.5%)
 C fill (shuffle within 16 byte blocks)               :   4894.2 MB/s (0.1%)
 C fill (shuffle within 32 byte blocks)               :   4889.4 MB/s (0.4%)
 C fill (shuffle within 64 byte blocks)               :   4894.0 MB/s (0.2%)
 ---
 standard memcpy                                      :   2934.9 MB/s
 standard memset                                      :   4893.5 MB/s (0.3%)
 ---
 NEON LDP/STP copy                                    :   2927.2 MB/s
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   2958.8 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   2960.9 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   2864.1 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   2861.6 MB/s
 NEON LD1/ST1 copy                                    :   2925.8 MB/s
 NEON STP fill                                        :   4892.3 MB/s (0.4%)
 NEON STNP fill                                       :   4859.3 MB/s (0.1%)
 ARM LDP/STP copy                                     :   2925.9 MB/s
 ARM STP fill                                         :   4892.6 MB/s (0.4%)
 ARM STNP fill                                        :   4854.5 MB/s (0.4%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :    668.8 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :    598.6 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :    711.0 MB/s
 NEON LD1/ST1 2-pass copy (from framebuffer)          :    649.0 MB/s
 ARM LDP/STP copy (from framebuffer)                  :    483.6 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :    467.4 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.1 ns          /     6.5 ns 
    131072 :    6.2 ns          /     8.7 ns 
    262144 :    8.9 ns          /    11.6 ns 
    524288 :   10.3 ns          /    13.3 ns 
   1048576 :   15.1 ns          /    21.4 ns 
   2097152 :  105.6 ns          /   159.7 ns 
   4194304 :  150.2 ns          /   199.5 ns 
   8388608 :  177.2 ns          /   219.2 ns 
  16777216 :  190.9 ns          /   227.3 ns 
  33554432 :  197.7 ns          /   232.0 ns 
  67108864 :  208.3 ns          /   245.0 ns 

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.1 ns          /     6.5 ns 
    131072 :    6.1 ns          /     8.7 ns 
    262144 :    7.2 ns          /     9.5 ns 
    524288 :    7.7 ns          /     9.9 ns 
   1048576 :   12.0 ns          /    16.9 ns 
   2097152 :  104.2 ns          /   156.9 ns 
   4194304 :  148.3 ns          /   195.3 ns 
   8388608 :  169.6 ns          /   207.0 ns 
  16777216 :  180.1 ns          /   210.5 ns 
  33554432 :  185.5 ns          /   212.5 ns 
  67108864 :  188.3 ns          /   213.7 ns 

That's the expected performance.

 

armbianmonitor -u: http://ix.io/1lAb

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...