Jump to content

hjc

Members
  • Posts

    92
  • Joined

  • Last visited

Posts posted by hjc

  1. 6 minutes ago, Nora Lee said:

     Bootloader and kernel under developing, draft version will be provided with developers, improving version will be released in public.

     

    This is indeed GPL violation. As soon as you release any binary complied from GNU GPLv2-licensed source, you should release the exact source that produced the binary, at the same time, under the same GNU GPLv2 license.

  2. 2 hours ago, tkaiser said:

    tested with my MacBook Pro USB-C charger not exceeding 500mA with 'dumb' consumers on the other end of the USB-C cable: endless boot/crash cycle.

    This is surprising, so using a MacBook charger on M4 is not even better than those old USB-A phone chargers with 5V 2A output without any quick charging protocol?

    Anyway this supports your view that the official 5V/4A charger should be purchased as well.

  3. 1 hour ago, tkaiser said:

     

    With something that's broken by design? Adding tons of complexity for no real reason and allowing to backdoor your system at a level the OS can never detect? Ever done a simple web search for 'uefi security vulnerabilities issue' or something like that?

    Speaking about firmware backdoor, x86 PC OEMs have never stopped doing that. Actually chip vendors and OEMs always have their methods to plant backdoors, regardless it's a UEFI system or not. on x86 they could use SMM, on ARMv8 nowadays, EL3 is used.

    Although we love those boards with open source ATF implementation, most ARM devices don't open their firmware, like Qualcomm chips will never allow you to modify code running in EL3, and they only provide the firmware in the form of binary blobs.

     

    But, well, I do agree that UEFI is a complex broken design. UEFI firmware nowadays is even more over-sized than regular operating systems in the good old days (when BIOS was used)

  4. 4 hours ago, chwe said:

    96boards have done it in the past. 

    That can be viewed as Android with rootfs replaced with GNU/Linux, thus no real Linux experiences (like updating the kernel with "apt upgrade" instead of connecting to a PC to run a set of fastboot commands)

    Though it is possible to load u-boot from LK, like those working on dragonboard 410c/820c. In this way, u-boot acts as (pretends to be) the kernel.

  5. On 7/14/2018 at 6:59 PM, codnoscope said:

    Hey I've seen some reports that the on board XOR flash will be flashed with GUI enabled uEFI firmware, is this true?

    That might be true, since there's already an RK3399Pkg (haven't tried myself), but I'd still recommend u-boot, unless you are interested in one of the following use cases:

    1. CentOS/Fedora/openSUSE, based on UEFI and grub on ARM64. These distros are mostly made for servers, instead of IoT stuff, so if you want something like KVM on ARM64, you could try to use them. But I doubt RK3399's 4G RAM could ever handle virtual machines. If you really want an ARM64 server, I recommend more serious solutions like Qualcomm centriq or Cavium ThunderX. If you just want to isolate your own code into different environments, use lxc/lxd or docker on (u-boot based) Armbian.

    2. Windows on ARM64, requires UEFI and ACPI. I'd admit it's quite attractive lightweight desktop solution, and it can even handle simple x86 programs as well. There's already someone made this for Raspberry Pi 3, but just forget it, that's definitely unusable, considering the horribly slow IO of RPi and the crazy (heavily IO bounded) background tasks of the most bloated desktop OS on the earth. On RK3399, yes, I think it is possible to handle Windows' crazy background IO with a USB or SATA or even NVMe SSD (actually a good eMMC could handle that as well), but you'll need to write a lot of Windows drivers to bring up a new platform, including SDIO, eMMC, USB, GMAC, PCIe, and the most difficult part, display drivers (like Rockchip DRM on Linux) on your own or you have to use the (probably slow as hell) UEFI GOP for display. GPU is actually not needed to run Windows desktop, and probably not possible unless you are an employee of ARM Holdings and has the source code of the Mali GPU Windows drivers which has never been released before but actually does exist. The assumptions above are all based on the fact that ACPI was implemented properly, and I don't think this would be true. You can implement ACPI on your own, though.

     

    I don't have any other idea to convince myself that UEFI would be useful on this platform. What's the exact use case you're expecting to do with UEFI?

     

    4 minutes ago, emk2203 said:

    It wouldn't hurt to ask Libre Computer and point out that their not-so-successful campaign (around 68% of target reached) might have been successful if they would have broad and stable support from Armbian.  

    IMO The biggest and only problem so far for all RK3399 boards is pricing. And this was the reason that I was excited about NanoPi M4, since it had a much lower price while keeping the essential components on board (like WiFi/BT). Unfortunately, people around me still think $65 is a little too high for an SBC when I recommend the board to them. For most people, purchasing a $99 (or higher) SBC is really a crazy idea, even if Armbian has great support for it.

     

  6. 4 hours ago, Igor said:


    Don't have this board. Untested.

    https://dl.armbian.com/firefly-rk3399/

    There's one issue in the Ubuntu (4.4 kernel) image: FriendlyARM kernel defines the CONFIG_MACH_NANOPI4, and only compile the NanoPi dtbs when it's set. So the dtb of Firefly RK3399 is missing in the linux-dtb-rk3399 package. Firefly-RK3399 might be moved into rockchip64 board family, or to use a patch to configure that. Anyway I'll test both kernel when I'm free.

     

  7. 3 minutes ago, emk2203 said:

    Any chance for you to share your kernel? My first attempt to compile a 4.19rc1 was unsuccessful. Or, let's say the compile went through, but I am still not sure what to put where afterwards. I would like to try with something known to work.

    There's already one usable mainline kernel in armbian official beta repo (use armbian-config to switch to nightly, reboot, then switch to dev kernel, reboot again), though USB 3.0 and PCIe/NVMe are not enabled.

  8. 12 hours ago, mindee said:

    Want to know the USB 3.0 speed. 

    In the "IO performance" chart, NanoPi M4 + JMS578 + 850 EVO is connected via USB 3.0. So the performance is about 388/300 MB/s and 70k/99k IOPS (read/write). This result is lower than that tested on my Intel NUC (435/440 MB/s read/write tested using diskspd64.exe) but it's acceptable, since most people won't connect a SSD in this way, and typically HDD is the bottleneck.

     

    Besides, by the time of IO testing, I hadn't thought about the DDR DVFS performance impact, so the IO performance is tested with CPU governor set to "performance", but dmc governor not changed, so there might be a slightly negative performance impact. Also, USB 3.0 on M4 still doesn't work on mainline (maybe due to kernel configuration or dt issues) so I did the IO testing with 4.4 kernel, and the IO performance may be further improved by using mainline.

  9. I've been doing some tests with NanoPi M4 these days. While I'm not a professional board reviewer, here I can share some early performance numbers to you. Beware that none of these tests fit into real world use cases, they are just provided as-is. Besides, Armbian development on RK3399 boards are still at a very early stage, so any of these numbers may change in the future, due to software changes.

    Unless mentioned, all tests are done using Armbian nightly image, FriendlyARM 4.4 kernel, CPU clocked at 2.0/1.5GHz

     

    Powering

    NanoPi M4 is my first board powered by USB-C, while RK3399 is not power-hungry under normal load, I do doubt if 5V/3A power supply is sufficient when the CPU load goes higher, or when a lot of USB devices are connected. So I went a series of power measurement, with this tool

    power.thumb.jpg.afe143b4d99cc41711c2cce697555b24.jpg

    That is to measure the power consumption on the USB side, excluding the consumption of PSU.

    The board is powered by the USB-C charger that came with my Huawei MateBook E, which supports 5V/2A, 9V/2A, and 12V/2A, so theoretically it is insufficient to power the NanoPi M4 board. Unfortunately I can't find a USB-C charger capable of 5V/3A output, and I have to do such test with it.

    What if I connected a lot of USB 3.0 device and exceeded the 5V/2A limit? Well, I did try that (connect 4 USB HDD and run cpuburn, or even connect 2 SBCs to the USB), and the answer is simple: the board crashed. But normally the board's consumption will not exceed 10W, so the charger works just fine.

     

    Test setup

    1) Idle consumption

    This is the typical consumption when you use it as an headless server.

     

    2) Idle consumption with HDMI display output (console tty interface, no Desktop/X11/GPU stuff)

    Testing with Dell P2415Q 4k 60Hz display. HDMI connected, with 2560*1440 60Hz video output. Also connect the USB 3.0 hub to

     

    3) Display connected, 802.11ac WiFi with iperf sending

    With HDMI display connected (same as (2)), and WiFi connected to 802.11ac 5GHz AP in another room, run the following command:

    iperf3 -c 10.24.0.1 -t 60

    The WiFi throughput is around 110Mbps

     

    4) Display connected, running cpuburn

    With HDMI display connected (same as (2)), run cpuburn on all 6 cores

     

    5) Idle consumption of 4.19-rc1 mainline kernel

    Same as (1), but running mainline kernel.

     

    Test results

    image.png.ee782a35ffa1c9015ba52437413bf81f.png

     

    The idle consumption is 1.79W, and it might need some tuning to reduce the consumption. When WiFi and display are connected, it goes higher to 2.87W.

    With an active WiFi networking, the board consumes 4.67W, and with all CPU cores active, it consumes 9.86W.

    Mainline kernel has a higher idle consumption, the reason might be DDR dvfs and/or devfreq are not implemented yet.

    Based on these results, it seems that 5V/2A power is okay if no peripheral devices are connected. However if you connect any USB devices, it may easily exceed the 2A limit when CPU load goes higher.

     

    CPU/RAM and IO Performance

    While RK3399 is not a super fast chip, its performance fits into its position. To reveal the full potential of the board, I'm posting some visualized sbc-bench results taken from mainline 4.19-rc1 kernel here. This is because there might be some DRAM performance issues on RK3399 with 4.4 kernel.. For comparison, I'm also posting the results of Firefly-RK3399 (2.2/1.8GHz overclock, tested by myself), Raspberry Pi 3 B+, ROCK64 and RockPro64 (taken from existing sbc-bench results)

     

    You can see the full sbc-bench log here.

     

    Memory

    image.png.a6634cc186547423ce4c99e6eb6f52d8.png

     

    7-zip

    image.png.584ce2f025d7c5d8b95b58b9ea31db30.png

     

    cpuminer

    image.png.b3004bf8b512e2fbccf5f57125022697.png

     

    For IO performance, I use iozone to measure the performance of SD card, eMMC and USB SSD. NanoPC T4's NVMe SSD results are added as a reference.

    SSD performance are measured by command "iozone -e -I -a -s 1G -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2", SD card and eMMC are using 100M instead of 1G size.

    image.png.a592212a7c3909291407784586c0f81e.png

     

    Networking

    NanoPi M4 comes with a 1Gbps ethernet port and a 802.11ac 2x2 MIMO WiFi module, and I tested both with iperf3.

    GbE iperf3 full duplex test:

    hjc@nanopim4:~$ iperf3 -c 10.20.0.1 & iperf3 -Rc 10.20.0.1 -p 5202
    [1] 27486
    Connecting to host 10.20.0.1, port 5201
    Connecting to host 10.20.0.1, port 5202
    Reverse mode, remote host 10.20.0.1 is sending
    [  4] local 10.20.0.2 port 43782 connected to 10.20.0.1 port 5201
    [  4] local 10.20.0.2 port 45102 connected to 10.20.0.1 port 5202
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  64.6 MBytes   542 Mbits/sec                  
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-1.00   sec  95.1 MBytes   798 Mbits/sec    0    314 KBytes       
    [  4]   1.00-2.00   sec   110 MBytes   919 Mbits/sec                  
    [  4]   1.00-2.00   sec  94.5 MBytes   793 Mbits/sec    0    320 KBytes       
    [  4]   2.00-3.00   sec   110 MBytes   920 Mbits/sec                  
    [  4]   2.00-3.00   sec  95.8 MBytes   803 Mbits/sec    0    317 KBytes       
    [  4]   3.00-4.00   sec   110 MBytes   920 Mbits/sec                  
    [  4]   3.00-4.00   sec  94.5 MBytes   792 Mbits/sec    0    317 KBytes       
    [  4]   4.00-5.00   sec   110 MBytes   920 Mbits/sec                  
    [  4]   4.00-5.00   sec  94.6 MBytes   794 Mbits/sec    0    314 KBytes       
    [  4]   5.00-6.00   sec   110 MBytes   919 Mbits/sec                  
    [  4]   5.00-6.00   sec  95.7 MBytes   803 Mbits/sec    0    314 KBytes       
    [  4]   6.00-7.00   sec   110 MBytes   919 Mbits/sec                  
    [  4]   6.00-7.00   sec  95.5 MBytes   801 Mbits/sec    0    317 KBytes       
    [  4]   7.00-8.00   sec   110 MBytes   920 Mbits/sec                  
    [  4]   7.00-8.00   sec  94.8 MBytes   795 Mbits/sec    0    314 KBytes       
    [  4]   8.00-9.00   sec   110 MBytes   920 Mbits/sec                  
    [  4]   8.00-9.00   sec  94.5 MBytes   792 Mbits/sec    0    314 KBytes       
    [  4]   9.00-10.00  sec  97.2 MBytes   816 Mbits/sec    0    320 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec   952 MBytes   799 Mbits/sec    0             sender
    [  4]   0.00-10.00  sec   949 MBytes   796 Mbits/sec                  receiver
    [  4]   9.00-10.00  sec   110 MBytes   921 Mbits/sec                  
    
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    iperf Done.
    [  4]   0.00-10.00  sec  1.03 GBytes   884 Mbits/sec    9             sender
    [  4]   0.00-10.00  sec  1.03 GBytes   882 Mbits/sec                  receiver
    
    iperf Done.
    [1]  + 27486 done       iperf3 -c 10.20.0.1
    
    

     

    Wireless

    hjc@nanopim4:~$ iperf3 -c 10.24.0.1
    Connecting to host 10.24.0.1, port 5201
    [  4] local 10.23.4.116 port 39730 connected to 10.24.0.1 port 5201
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-1.00   sec  13.0 MBytes   109 Mbits/sec   13   1.21 MBytes       
    [  4]   1.00-2.01   sec  12.9 MBytes   107 Mbits/sec    5    618 KBytes       
    [  4]   2.01-3.00   sec  12.6 MBytes   106 Mbits/sec    0    618 KBytes       
    [  4]   3.00-4.00   sec  9.35 MBytes  78.7 Mbits/sec    4    329 KBytes       
    [  4]   4.00-5.00   sec  11.1 MBytes  92.9 Mbits/sec    0    348 KBytes       
    [  4]   5.00-6.00   sec  10.2 MBytes  85.5 Mbits/sec    0    363 KBytes       
    [  4]   6.00-7.00   sec  9.37 MBytes  78.6 Mbits/sec    0    387 KBytes       
    [  4]   7.00-8.00   sec  10.9 MBytes  91.5 Mbits/sec    0    409 KBytes       
    [  4]   8.00-9.00   sec  13.6 MBytes   114 Mbits/sec    0    409 KBytes       
    [  4]   9.00-10.00  sec  13.8 MBytes   116 Mbits/sec    0    410 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec   117 MBytes  98.0 Mbits/sec   22             sender
    [  4]   0.00-10.00  sec   116 MBytes  97.0 Mbits/sec                  receiver
    
    iperf Done.
    hjc@nanopim4:~$ iperf3 -c 10.24.0.1 -R
    Connecting to host 10.24.0.1, port 5201
    Reverse mode, remote host 10.24.0.1 is sending
    [  4] local 10.23.4.116 port 39734 connected to 10.24.0.1 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  10.6 MBytes  88.8 Mbits/sec                  
    [  4]   1.00-2.00   sec  10.9 MBytes  91.5 Mbits/sec                  
    [  4]   2.00-3.00   sec  4.41 MBytes  37.0 Mbits/sec                  
    [  4]   3.00-4.00   sec  2.07 MBytes  17.3 Mbits/sec                  
    [  4]   4.00-5.00   sec  1018 KBytes  8.34 Mbits/sec                  
    [  4]   5.00-6.00   sec  1.29 MBytes  10.8 Mbits/sec                  
    [  4]   6.00-7.00   sec  6.48 MBytes  54.4 Mbits/sec                  
    [  4]   7.00-8.00   sec  10.8 MBytes  91.0 Mbits/sec                  
    [  4]   8.00-9.00   sec  10.7 MBytes  89.9 Mbits/sec                  
    [  4]   9.00-10.00  sec  10.7 MBytes  89.8 Mbits/sec                  
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec  70.1 MBytes  58.8 Mbits/sec    0             sender
    [  4]   0.00-10.00  sec  69.1 MBytes  58.0 Mbits/sec                  receiver
    
    iperf Done.
    

    It's too complicated to analyze the performance of a WiFi connection, but so far I've never seen more than 200Mbps throughput on AP6356S.

  10. 1 hour ago, DevDrake said:

    @hjc i tried your way and it works well :)

    Still i would love to have run-time installed to not deploy self-containing app every time.

    MS did not release deb/rpm packages for armhf/arm64, instead, you should manually install them, including dependencies. The drawback is the every time the runtime updates, you need to follow these steps again.

     

    If you only need to run your .NET Core apps but not building them on the board, you can download arm64 runtime at https://dotnetcli.blob.core.windows.net/dotnet/Runtime/2.1.3/dotnet-runtime-2.1.3-linux-arm64.tar.gz and follow the steps here to install.

    If you have a really powerful board (like one of those RK3399 boards) and want to build apps on it, use https://dotnetcli.blob.core.windows.net/dotnet/Sdk/2.1.401/dotnet-sdk-2.1.401-linux-arm64.tar.gz instead.

     

     

     

     

  11. 3 minutes ago, tkaiser said:

    Did you run into other issues with M4 so far?

    For 4.4 kernel, there's no other issues. Everything else works fine (at least as a headless server. Haven't tried to connect a monitor yet). I may try the Bionic desktop image this weekend.

    As for mainline, WiFi and USB 3.0 does not work. lsusb only shows the otg 2.0 root hub.

     

    PCIe and MIPI are not tested yet.

     

    1 minute ago, tkaiser said:

    Different DRAM type seems to be no problem yet, right?

    Yes, AFAIK Rockchip's binary blobs can detect the DDR type and initialize them accordingly. I can see different DRAM initialize log output on T4 and M4, although they use the same rk3399_ddr_800MHz_v1.14.bin file and exactly the same u-boot config.

  12. Before running tinymembench:

    hjc@nanopim4:/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc$ cat trans_stat 
       From  :   To
             :200000000300000000400000000528000000600000000800000000   time(ms)
    *200000000:       0       0       0       0       0       5    117427
     300000000:       3       0       0       0       0       1      1157
     400000000:       0       0       0       0       0       0         0
     528000000:       0       1       0       0       0       0       200
     600000000:       0       0       0       0       0       0         0
     800000000:       2       3       0       1       0       0       446
    Total transition : 16

     

    When testing memory bandwidth:

       From  :   To
             :200000000300000000400000000528000000600000000800000000   time(ms)
     200000000:       0       0       0       0       0       6    142405
     300000000:       3       0       0       0       0       1      1157
     400000000:       0       0       0       0       0       0         0
     528000000:       0       1       0       0       0       0       200
     600000000:       0       0       0       0       0       0         0
    *800000000:       2       3       0       1       0       0     13315
    Total transition : 17

     

    When testing latency:

       From  :   To
             :200000000300000000400000000528000000600000000800000000   time(ms)
    *200000000:       0       0       0       0       0       6    310967
     300000000:       4       0       0       0       0       1      1257
     400000000:       0       1       0       0       0       0     17200
     528000000:       0       1       1       0       0       0       300
     600000000:       0       0       0       0       0       0         0
     800000000:       2       3       0       2       0       0    177565
    Total transition : 21

    (and tinymembench shows high latency)

     

    After:

       From  :   To
             :200000000300000000400000000528000000600000000800000000   time(ms)
    *200000000:       0       0       0       0       0       6   1029003
     300000000:       4       0       0       0       0       1      1257
     400000000:       0       1       0       0       0       0     17200
     528000000:       0       1       1       0       0       0       300
     600000000:       0       0       0       0       0       0         0
     800000000:       2       3       0       2       0       0    177565
    Total transition : 21
    

     

    1 hour ago, tkaiser said:

    echo performance >governor

    After setting this

     

    tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
    
    ==========================================================================
    == Memory bandwidth tests                                               ==
    ==                                                                      ==
    == Note 1: 1MB = 1000000 bytes                                          ==
    == Note 2: Results for 'copy' tests show how many bytes can be          ==
    ==         copied per second (adding together read and writen           ==
    ==         bytes would have provided twice higher numbers)              ==
    == Note 3: 2-pass copy means that we are using a small temporary buffer ==
    ==         to first fetch data into it, and only then write it to the   ==
    ==         destination (source -> L1 cache, L1 cache -> destination)    ==
    == Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
    ==         brackets                                                     ==
    ==========================================================================
    
     C copy backwards                                     :   2931.5 MB/s (4.2%)
     C copy backwards (32 byte blocks)                    :   2926.2 MB/s
     C copy backwards (64 byte blocks)                    :   2874.3 MB/s
     C copy                                               :   2903.6 MB/s
     C copy prefetched (32 bytes step)                    :   2866.9 MB/s
     C copy prefetched (64 bytes step)                    :   2863.8 MB/s
     C 2-pass copy                                        :   2583.9 MB/s
     C 2-pass copy prefetched (32 bytes step)             :   2640.9 MB/s
     C 2-pass copy prefetched (64 bytes step)             :   2635.6 MB/s
     C fill                                               :   4892.7 MB/s (0.5%)
     C fill (shuffle within 16 byte blocks)               :   4894.2 MB/s (0.1%)
     C fill (shuffle within 32 byte blocks)               :   4889.4 MB/s (0.4%)
     C fill (shuffle within 64 byte blocks)               :   4894.0 MB/s (0.2%)
     ---
     standard memcpy                                      :   2934.9 MB/s
     standard memset                                      :   4893.5 MB/s (0.3%)
     ---
     NEON LDP/STP copy                                    :   2927.2 MB/s
     NEON LDP/STP copy pldl2strm (32 bytes step)          :   2958.8 MB/s
     NEON LDP/STP copy pldl2strm (64 bytes step)          :   2960.9 MB/s
     NEON LDP/STP copy pldl1keep (32 bytes step)          :   2864.1 MB/s
     NEON LDP/STP copy pldl1keep (64 bytes step)          :   2861.6 MB/s
     NEON LD1/ST1 copy                                    :   2925.8 MB/s
     NEON STP fill                                        :   4892.3 MB/s (0.4%)
     NEON STNP fill                                       :   4859.3 MB/s (0.1%)
     ARM LDP/STP copy                                     :   2925.9 MB/s
     ARM STP fill                                         :   4892.6 MB/s (0.4%)
     ARM STNP fill                                        :   4854.5 MB/s (0.4%)
    
    ==========================================================================
    == Framebuffer read tests.                                              ==
    ==                                                                      ==
    == Many ARM devices use a part of the system memory as the framebuffer, ==
    == typically mapped as uncached but with write-combining enabled.       ==
    == Writes to such framebuffers are quite fast, but reads are much       ==
    == slower and very sensitive to the alignment and the selection of      ==
    == CPU instructions which are used for accessing memory.                ==
    ==                                                                      ==
    == Many x86 systems allocate the framebuffer in the GPU memory,         ==
    == accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
    == PCI-E is asymmetric and handles reads a lot worse than writes.       ==
    ==                                                                      ==
    == If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
    == or preferably >300 MB/s), then using the shadow framebuffer layer    ==
    == is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
    == performance improvement. For example, the xf86-video-fbturbo DDX     ==
    == uses this trick.                                                     ==
    ==========================================================================
    
     NEON LDP/STP copy (from framebuffer)                 :    668.8 MB/s
     NEON LDP/STP 2-pass copy (from framebuffer)          :    598.6 MB/s
     NEON LD1/ST1 copy (from framebuffer)                 :    711.0 MB/s
     NEON LD1/ST1 2-pass copy (from framebuffer)          :    649.0 MB/s
     ARM LDP/STP copy (from framebuffer)                  :    483.6 MB/s
     ARM LDP/STP 2-pass copy (from framebuffer)           :    467.4 MB/s
    
    ==========================================================================
    == Memory latency test                                                  ==
    ==                                                                      ==
    == Average time is measured for random memory accesses in the buffers   ==
    == of different sizes. The larger is the buffer, the more significant   ==
    == are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
    == accesses. For extremely large buffer sizes we are expecting to see   ==
    == page table walk with several requests to SDRAM for almost every      ==
    == memory access (though 64MiB is not nearly large enough to experience ==
    == this effect to its fullest).                                         ==
    ==                                                                      ==
    == Note 1: All the numbers are representing extra time, which needs to  ==
    ==         be added to L1 cache latency. The cycle timings for L1 cache ==
    ==         latency can be usually found in the processor documentation. ==
    == Note 2: Dual random read means that we are simultaneously performing ==
    ==         two independent memory accesses at a time. In the case if    ==
    ==         the memory subsystem can't handle multiple outstanding       ==
    ==         requests, dual random read has the same timings as two       ==
    ==         single reads performed one after another.                    ==
    ==========================================================================
    
    block size : single random read / dual random read, [MADV_NOHUGEPAGE]
          1024 :    0.0 ns          /     0.0 ns 
          2048 :    0.0 ns          /     0.0 ns 
          4096 :    0.0 ns          /     0.0 ns 
          8192 :    0.0 ns          /     0.0 ns 
         16384 :    0.0 ns          /     0.0 ns 
         32768 :    0.0 ns          /     0.0 ns 
         65536 :    4.1 ns          /     6.5 ns 
        131072 :    6.2 ns          /     8.7 ns 
        262144 :    8.9 ns          /    11.6 ns 
        524288 :   10.3 ns          /    13.3 ns 
       1048576 :   15.1 ns          /    21.4 ns 
       2097152 :  105.6 ns          /   159.7 ns 
       4194304 :  150.2 ns          /   199.5 ns 
       8388608 :  177.2 ns          /   219.2 ns 
      16777216 :  190.9 ns          /   227.3 ns 
      33554432 :  197.7 ns          /   232.0 ns 
      67108864 :  208.3 ns          /   245.0 ns 
    
    block size : single random read / dual random read, [MADV_HUGEPAGE]
          1024 :    0.0 ns          /     0.0 ns 
          2048 :    0.0 ns          /     0.0 ns 
          4096 :    0.0 ns          /     0.0 ns 
          8192 :    0.0 ns          /     0.0 ns 
         16384 :    0.0 ns          /     0.0 ns 
         32768 :    0.0 ns          /     0.0 ns 
         65536 :    4.1 ns          /     6.5 ns 
        131072 :    6.1 ns          /     8.7 ns 
        262144 :    7.2 ns          /     9.5 ns 
        524288 :    7.7 ns          /     9.9 ns 
       1048576 :   12.0 ns          /    16.9 ns 
       2097152 :  104.2 ns          /   156.9 ns 
       4194304 :  148.3 ns          /   195.3 ns 
       8388608 :  169.6 ns          /   207.0 ns 
      16777216 :  180.1 ns          /   210.5 ns 
      33554432 :  185.5 ns          /   212.5 ns 
      67108864 :  188.3 ns          /   213.7 ns 
    

    That's the expected performance.

     

    armbianmonitor -u: http://ix.io/1lAb

  13. 22 hours ago, hjc said:

    M4 (4.4 armbian nightly kernel) w/ the official huge heatsink attached: http://ix.io/1lvP

    There's something wrong (DRAM related) when running Rockchip 4.4 kernel, which causes the latency to be twice as much as other RK3399 boards. This causes very poor 7-zip performance, and it takes a long time to run the tinymembench. (~20 minutes both on big and little cores)

    However the DRAM is performing normally on mainline kernel (4.19-rc1), and the benchmark numbers are identical to other boards.

     

    Mainline kernel benchmark details: http://ix.io/1lzx. I didn't modify the opp table and thermal trip point, and it's limited to 70℃ and 1.8/1.4GHz, so thermal throttling occurs very frequently. Though it's still very powerful running under 1.6/1.4GHz and keeps cool.

     

    Edit: Re-run with opp/trip point modified: http://ix.io/1lzP

  14. 2 hours ago, pbies said:

    What I am thinking now is to move this board to Armbian. Will it work?

    Of course it could work, but as I said before, you'd need at least mainline or a recent (2017.x) u-boot working. This means if your board vendor does not provide mainline u-boot support (like most RK3399 devices except Firefly), you need to somehow write a u-boot device tree for your board, just like what I did for NanoPC T4. Unfortunately, you need to do this by yourself.

     

    2 hours ago, pbies said:

    How should I in this case organize the partitions on eMMC and where to load kernel and rootfs?

    Armbian use a single ext4 partition, so you don't need to mess up with those default Rockchip partition layout (7 partitions is really a mess for an SBC)

     

    BTW, I don't know exactly how Orange Pi RK3399 works, but normally, RK3399 will boot from SD card if eMMC/SPI flash is not available or there's no u-boot SPL on eMMC/SPI flash. You may erase the eMMC (blkdiscard) and use an SD card for debugging, which is more convenient.

  15. 3 hours ago, tkaiser said:

    Then added a large fan blowing directly over the heatsink and now cpu-miner kills the board

    Does the whole board powered down immediately, or it ran into kernel panic or benchmark process crash?

     

    I haven't analyzed the schematics of both boards, but I guess there are two possibilities:

    1. different voltage requirements in different batches of RK3399 causes the issue.

    There are already different opp tables in the kernel, for different RK3399 batches: rk3399-opp.dtsi rk3399-op1-opp.dtsi, The latter is for Chromebooks, which uses lower voltage and are more efficient.

    Insufficient voltage are basically causing the same issues like those in x86 overclocking, resulting in kernel panic & process crash.

    2. On NanoPC T4, overclocking makes it exceed the max allowed current of the board, which triggers some sort of protection, and the board powered down immediately.

     

    IMO the 2.2/1.8GHz overclocking could be an option, but should never enabled by default. Though we could try a clock slightly higher than official configuration (e.g. 2.2/1.5 or even 2.0/1.5), have it stress tested on multiple batches of the board, and make it the default value.

  16. 17 minutes ago, Igor said:

    Additional families?

     

    rk3399-pine64

    rk3399-fa

    joined once into rk3399?

    Is it possible to share a board family between Rock64 and RockPro64? If they could use exactly the same kernel source and u-boot source from ayufan...

     

    3 hours ago, mmarks said:

    I see you are running kernel 4.4.150 while the debian stretch version for download is 4.4.148. Do I need to  do an offline full build to pick up the latest changes?

    Use armbian-config and switch to nightly build, and you'll get the latest kernel.

  17. Looks like my M4 will arrive next Thursday. One thing that I still concern about is that, I ordered the 2GB RAM model, which uses DDR3 instead of LPDDR3 that the 4GB ones use, so I doubt if there's any RAM initialization differences to take care about when creating images for the board. At least in u-boot, LPDDR3 and DDR3 are using different timing parameters, specified in device tree.

  18. 6 minutes ago, chwe said:

    well maybe ali has dynamic pricing but if you send me a RK3399 TV-box with eMMC for 40$, I'll buy 1-2 of them.. :P All the boxes I find are in the 100$ price-range.. 

    I'm a little curious how their price could be so low. AFAIK the RK3399 SoC itself is about $20, and 2GB DRAM costs around $20 (or more, these days).

  19. 4 hours ago, chwe said:

    Mines is sluggish but a second ssh connection with something like ping running 'to keep it active the whole time' helps

    Well, my Intel 8275 also suffers from these kind of problems (on Debian stretch), and I guess it's related to firmware version and low power control.

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines