Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. I would check/clean contacts of the 'bad' slot first. Search for mPCIe pinout and take care of both data lines and power pins.
  2. Preliminary 'performance' summary Based on the tests done above and elsewhere let's try to collect some performance data. Below GPU data is missing for the simple reason that I'm not interested in anything GPU related (or attaching a display at all). Besides used for display stuff and 'retro gaming' RK3399's Mali T860 MP4 GPU is also OpenCL capable. If you search for results (ODROID N1's SoC is available for some years now so you find a lot by searching for 'RK3399' -- for example here are some OpenCL/OpenCV numbers) please keep in mind that Hardkernel might use different clockspeeds for the GPU as well (with CPU cores it's just like that: almost everywhere around big/little cores are clocked with 1.8/1.4 GHz while the N1 settings use 2.0/1.5 GHz instead) CPU horsepower Situation with RK3399 is somewhat special since it's a HMP design combining two fast Cortex-A72 cores with four 'slow' A53. So depending on which CPU core a job lands execution time can vary by factor 2. With Android or 'Desktop Linux' workloads this shouldn't be an issue since there things are mostly single-threaded and the scheduler will move these tasks to the big cores automagically if performance is needed. With other workloads it differs: People wanting to use RK3399 as part of a compile farm might be disappointed and still prefer ARM designs that feature four instead of two fast cores (eg. RK3288 or Exynos 5422 -- for reasons why see again comments section on CNX) For 'general purpose' server use cases the 7-zip scores are interesting since giving a rough estimate how fast a RK3399 device will perform as server (or how many tasks you can run in parallel). Overall score is 6,500 (see this comparison list) but due to the big.LITTLE design we're talking about the big cluster scoring at 3350 and the little cluster at 3900. So tasks that execute on the big cores finish almost twice as fast. Keep this in mind when setting up your environment. Experimenting with cgroups and friends to assign certain tasks to specific CPU clusters will be worth the efforts! 'Number crunchers' who can make use of NEON instructions should look at 'cpuminer --benchmark' results: We get a total 8.80 kH/s rate when running on all 6 cores (big cores only: 4.10 kH/s, little cores only: 4.90 kH/s -- so again 'per core' performance almost twice as good on the big cores) which is at the same performance level of an RK3288 (4 x A17) but gets outperformed by an ODROID XU4 for example at +10kH/s since there the little cores add a little bit to the result. But this needs improved cooling otherwise an XU4 will immediately throttle down. The RK3399 provides this performance with way lower consumption and heat generation! Crypto performance: just awesome due to ARMv8 Crypto Extensions available and useable on all cores in parallel. Simply check cryptsetup results above and our 'openssl speed' numbers and keep in mind that if your crypto stuff can run in parallel (eg. terminating few different VPN sessions) you can almost add the individual throughput numbers (and even with 6 threads in parallel at full clockspeed the RK3399 just draws 10W more compared to idle) Talking about 'two fast and four slow CPU cores': the A53 cores are clocked at 1.5GHz so when comparing with RK3399's little sibling RK3328 with only 4xA53 (ROCK64, Libre Computer Renegade or Swiftboard/Transformer) the RK3399 when running on the 'slow' cores will compete or already outperform the RK3328 boards but still has 2 big cores available for heavy stuff. But since a lot of workloads are bottlenecked by memory bandwidth you should have a look at the tinymembench results collected above (and use some google-fu to compare with other devices) Storage performance N1 has 2 SATA ports provided by a PCIe attached ASM1061 controller and 2 USB3 ports directly routed to the SoC. The per port bandwidth limitation that also seems to apply to both port groups is around 390 MB/s (applies to all ports regardless whether SATA or USB3 -- also random IO performance with default settings is pretty much the same). But this is not an overall internal SoC bottleneck since when testing with fast SSDs on both USB3 and SATA ports at the same time we got numbers at around ~750MB/s. I just retested again with an EVO840 on the N1 at SATA and USB3 ports with a good UAS capable enclosure and as a comparison repeated the same test with a 'true NAS SoC': the Marvell Armada 385 on Clearfog Pro which provides 'native SATA' by the SoC itself: If we look carefully at the numbers we see that USB3 slightly outperforms ASM1061 when it's about top sequential performance. The two ASM1061 numbers are due to different settings of /sys/module/pcie_aspm/parameters/policy (defaults to powersave but can be changed to performance which not only results in ~250mW higher idle consumption but also a lot better performance with small block sizes). While USB3 seems to perform slightly better when looking only at irrelevant sequential transfer speeds better attach disks to the SATA ports for a number of reasons: With USB you need disk enclosures with good USB to SATA bridges that are capable of UAS --> 'USB Attached SCSI' (we can only recommend the following ones: ASMedia ASM1153/ASM1351, JMicron JMS567/JMS578 or VIA VL711/VL715/VL716 -- unfortunately even if those chipsets are used sometimes crappy firmwares need USB quirks or require UAS blacklisting and then performance sucks. A good example are Seagate USB3 disks) When you use SSDs you want to be able to use TRIM (helps with retaining drive performance and increases longevity). With SATA attached SSDs this is not a problem but on USB ports it depends on a lot of stuff and usually does NOT work. If you understand just half of what's written here then think about SSDs on USB ports otherwise better choose the SATA ports here And PCIe is also less 'expensive' since it needs less ressources (lower CPU utilization with disk on SATA ports and less interrupts to process, see the 800k IRQs for SATA/PCIe vs. 2 million for USB3 with exactly the same workload below): 226: 180 809128 0 0 0 0 ITS-MSI 524288 Edge 0000:01:00.0 226: 0 0 0 0 0 0 ITS-MSI 524288 Edge 0000:01:00.0 227: 277 0 2066085 0 0 0 GICv3 137 Level xhci-hcd:usb5 228: 0 0 0 0 0 0 GICv3 142 Level xhci-hcd:usb7 There's also eMMC and SD cards useable as storage. Wrt SD cards it's too early to talk about performance since at least the N1 developer samples do only implement the slowest SD card speed mode (and I really hope this will change with the final N1 version later) a necessary kernel patch is missing to remove the current SD card performance bottleneck.. The eMMC performance is awesome! If we look only at random IO performance with smaller block sizes (that's the 'eMMC as OS drive' use case) then the Hardkernel eMMC modules starting at 32GB size perform as fast as an SSD connected to USB3 or SATA ports. With SATA ports we get a nice speed boost by changing ASPM (Active State Power Management) settings by switching from the 'powersave' default to performance (+250mW idle consumption). Only then a SSD behind a SATA port on N1 can outperform a Hardkernel eMMC module wrt random IO or 'OS drive' performance. But of course this has a price: when SATA or USB drives are used consumption is a lot higher. Network performance Too early to report 'success' but I'm pretty confident we get Gigabit Ethernet fully saturated after applying some tweaks. With RK3328 it was the same situation in the beginning and maybe same fixes that helped there will fix it with RK3399 on N1 too. I would assume progress can be monitored here: https://forum.odroid.com/viewtopic.php?f=150&t=30126
  3. You need a bunch of terminal/SSH windows and then do as root armbianmonitor -m (to get an idea whether throttling happens, how %iowait looks like, how cpufreq scaling is working) htop (to see whether the benchmark you're running is bottlenecked by one or more CPU cores being at 100% -- if that's the case you found a CPU bottleneck and need to find ways around that) iostat 5 (to get some storage statistics) iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 (the real test) If you see in htop CPU cores being maxed out at 80% or above you need to check /proc/interrupts for IRQ collissions and might have to pin the benchmark execution itself to another CPU core, e.g. using taskset -c 1 iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 to send the test to the 2nd CPU core called cpu1. And then you need a 'baseline' when testing with storage (eg. a beefy x86 box with native SATA where you benchmark the storage device before to get an idea about it's real capabilities) since otherwise you have no idea where to look at when numbers are lower than expected (is it the board bottlenecking or the storage device?) Unfortunately there's also something called software and since Armbian constantly tries to be bleeding edge with bootloaders and kernels it got 100% stupid to collect here any benchmark numbers since the performance influence by different kernel versions can be MASSIVE. As an example: If you choose for an Clearfog the Debian Stretch next image from here https://dl.armbian.com/clearfogpro/ then SATA performance will be SHIT until you run 'apt upgrade ; reboot' since with latest 4.14.18 kernel available through Armbian repo performance is good again. So benchmarks made on 1st February look crappy while those made today with latest kernel show an entirely different picture
  4. apt install device-tree-compiler cd /media/boot/ dtc -I dtb -O dts -o rk3399-odroidn1-linux.dts rk3399-odroidn1-linux.dtb cp rk3399-odroidn1-linux.dts rk3399-odroidn1-linux-mod.dts Then you tweak rk3399-odroidn1-linux-mod.dts and afterwards: dtc -I dts -O dtb -o rk3399-odroidn1-linux.dtb rk3399-odroidn1-linux-mod.dts If something goes wrong your board might not even boot any more afterwards. And you'll get a ton of warning with the above, see https://forum.armbian.com/topic/6496-odroid-n1-not-a-review-yet/?do=findComment&comment=49438
  5. Thermal update Since I was curious why temperatures in idle and under load were that low and to be assured that throttling with the 4.4 BSP kernel we're currently using works... I decided to remove N1's heatsink: Looks good, so now let's see how the board performs without heatsink applied. Since I had not the slightest idea whether throttling works and how I decided to let a huge fan assist in the beginning: Board booted up nicely, the small PWM fan started to blow air around, the large efficiently cooled somewhat and I decided to again run 'cpuminer --benchmark'. To my surprise (I expected ODROID XU4 behaviour) the big cores were only throttled to 1800 and 1608 after a couple of minutes so at least I knew throttling was working. Then deciced to stop the 5V USB connected fan and let the benchmark run on its own (board lying flat on the table, neither heatsink nor fan involved). After about half an hour cpuminer reported still a hash rate of 'Total: 6.60 kH/s' (all 6 cores involved) and armbianmonitor output showed current throttling settings: Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 17:40:23: 1008/1512MHz 6.56 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:40:28: 1008/1512MHz 6.91 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:40:33: 1008/1512MHz 6.84 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:40:38: 816/1512MHz 6.77 100% 0% 0% 99% 0% 0% 85.0°C 3/3 17:40:43: 816/1512MHz 6.71 100% 0% 0% 99% 0% 0% 85.0°C 3/3 17:40:48: 1008/1512MHz 6.73 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:40:53: 1008/1512MHz 6.67 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:40:59: 1008/1512MHz 6.62 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:04: 1008/1512MHz 6.57 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:09: 1008/1512MHz 6.52 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:14: 1008/1512MHz 6.48 100% 0% 0% 99% 0% 0% 83.9°C 3/3 17:41:19: 1200/1512MHz 6.44 100% 0% 0% 99% 0% 0% 85.0°C 3/3 17:41:24: 1200/1512MHz 6.41 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:29: 1008/1512MHz 6.37 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:34: 1008/1512MHz 6.34 100% 0% 0% 99% 0% 0% 84.4°C 3/3 Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 17:41:39: 1008/1512MHz 6.40 100% 0% 0% 99% 0% 0% 84.4°C 3/3 17:41:45: 1200/1512MHz 6.37 100% 0% 0% 99% 0% 0% 85.6°C 3/3 17:41:50: 1992/1512MHz 5.86 24% 0% 0% 23% 0% 0% 78.8°C 3/3 17:41:55: 1992/1512MHz 5.39 0% 0% 0% 0% 0% 0% 75.0°C 3/3 So the big cores were throttled down to even 816 MHz but the board was still running with full load and generated 6.60 kH/s. Before I stopped the benchmark I checked the Powermeter reporting 8.2W. In other words: with these throttling settings clocking only the big cores down) we're talking now about a 5W delta compared to idle and 6.6 kH/s That's 1.3 kH/s per W consumed. Pretty amazing especially when comparing with ODROID XU4 or Tinkerboard... After stopping the benchmark I put the board into an upright position and switched to ondemand governor to watch the temperatures dropping down to 45°C (full armbianmonitor output): Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 18:04:31: 408/ 408MHz 0.01 0% 0% 0% 0% 0% 0% 45.6°C 0/3 18:04:36: 408/ 408MHz 0.01 0% 0% 0% 0% 0% 0% 45.0°C 0/3 18:04:41: 408/ 408MHz 0.01 0% 0% 0% 0% 0% 0% 45.0°C 0/3 18:04:46: 408/ 408MHz 0.01 0% 0% 0% 0% 0% 0% 45.6°C 0/3 That's really impressive. But be warned: once you use Android on this thing or GPU acceleration works within Linux then operation without heatsink won't be a good idea (the Mali on this SoC is quite capable). Anyway: with pure CPU workloads this all looks very nicely and way more energy efficient than those beefy ARMv7 boards with Cortex-A15 or A17 cores.
  6. @tkaiser where do I find instructions on setting up the tests to get the results you post here? The link above gives a 500 error ... Well, since it's not here https://web.archive.org/web/*/http://sprunge.us/WbOK/* it's gone... sprunge.us stopped working some time ago...
  7. Since we were already talking about power vs. consumption I gave cpuburn-a53 a try. I had to manually start it on the big cluster as well ('taskset -c 4,5 cpuburn-a53 &') but when the tool ran on all 6 CPU cores the fan started to spin on lowest level and SoC temperature became stable at 52.8°C: Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 13:00:34: 1992/1512MHz 8.44 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:00:42: 1992/1512MHz 8.40 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:00:51: 1992/1512MHz 8.41 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:00:59: 1992/1512MHz 8.42 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:01:08: 1992/1512MHz 8.39 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:01:17: 1992/1512MHz 8.40 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:01:25: 1992/1512MHz 8.41 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:01:33: 1992/1512MHz 8.43 100% 0% 99% 0% 0% 0% 52.8°C 1/3 13:01:42: 1992/1512MHz 8.40 100% 0% 99% 0% 0% 0% 52.8°C 1/3^C My powermeter showed then also just 12.1W so it seems with such heavy NEON workloads and RK3399 busy on all CPU cores we can't get the board to consume more than 9W compared to idle... Testing again with openssl and the crypto engine I'll see the powermeter reporting 13.2W maximum (that's 10W more compared to idle) while the fan is working harder but temperature still below 60°C: Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 13:12:06: 1992/1512MHz 6.01 100% 0% 99% 0% 0% 0% 55.0°C 2/3 13:12:13: 1992/1512MHz 6.17 99% 0% 99% 0% 0% 0% 55.6°C 2/3 13:12:20: 1992/1512MHz 6.16 100% 0% 99% 0% 0% 0% 55.0°C 2/3 13:12:27: 1992/1512MHz 6.14 100% 0% 99% 0% 0% 0% 55.0°C 2/3 13:12:33: 1992/1512MHz 6.27 99% 0% 99% 0% 0% 0% 54.4°C 2/3 13:12:40: 1992/1512MHz 6.25 100% 0% 99% 0% 0% 0% 55.0°C 2/3 13:12:47: 1992/1512MHz 6.23 99% 0% 99% 0% 0% 0% 56.7°C 2/3 IMO this is pretty amazing and I've to admit that I start to like the fansink Hardkernel put on this board. While looking similar to the one on my XU4 bought last year this one is way less annoying. If one puts the N1 into a cabinet (as I do with all IT stuff I don't need on my desk) you can't hear the thing.
  8. Well, just setting two nodes to disabled results in PCIe being gone but just ~150mW (mW not mA!) less consumption: root@odroid:/media/boot# diff rk3399-odroidn1-linux.dts rk3399-odroidn1-linux-mod.dts 8c8 < model = "Hardkernel ODROID-N1"; --- > model = "Hardkernel ODROID-N1 low power"; 1654c1654 < status = "okay"; --- > status = "disabled"; 1682c1682 < status = "okay"; --- > status = "disabled"; root@odroid:/media/boot# cat /proc/device-tree/model ; echo Hardkernel ODROID-N1 low power root@odroid:/media/boot# lspci root@odroid:/media/boot# After reverting back to original DT I've PCIe back and consumption increased by a whopping ~150mW root@odroid:/home/odroid# lspci 00:00.0 PCI bridge: Device 1d87:0100 01:00.0 IDE interface: ASMedia Technology Inc. ASM1061 SATA IDE Controller (rev 02)
  9. Hmm... root@odroid:/media/boot# dtc -I dtb -O dts -o rk3399-odroidn1-linux.dts rk3399-odroidn1-linux.dtb Warning (unit_address_vs_reg): Node /usb@fe800000 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /usb@fe900000 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@0 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@1 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@2 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@3 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@4 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@5 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@6 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@7 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@8 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /thermal-zones/soc-thermal/trips/trip-point@9 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /phy@e220 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /efuse@ff690000/id has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/cpul-leakage has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/cpub-leakage has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/gpu-leakage has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/center-leakage has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/logic-leakage has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /efuse@ff690000/wafer-info has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): Node /gpio-keys/button@0 has a unit name, but no reg property Warning (unit_address_vs_reg): Node /gpiomem has a reg or ranges property, but no unit name root@odroid:/media/boot# cat rk3399-odroidn1-linux.dts | curl -F 'f:1=<-' http://ix.io http://ix.io/KQc Anyway, I backed the eMMC contents up already yesterday so nothing can go wrong
  10. Since ODROID N1 with current default settings has a pretty high 'ground' consumption (most probably both related to the ASM1061 and DC-DC circuitry) we should better talk about consumption differences. I get 3.2W at the wall in idle and 12.1W when running 'cpuminer --benchmark'. So that's 8.9W for '8.77 kH/s' or just about 1W per kH/s (12V PSU included!). Now let's try the same with ODROID XU4 To get an idea how much the ASM1061 adds to idle consumption I would assume that we need to change CONFIG_PCIE_ROCKCHIP and friends from y to m? Or use DT overlays to disable the respective DT nodes?
  11. Impossible since neither exists https://github.com/hardkernel/linux/blob/ee38808d9fd0ea4e4db980c82ba717b09fb103ae/arch/arm64/configs/odroidn1_defconfig#L114
  12. Cryptsetup benchmark Here we go. Same numbers with all cores active or just the big ones: # Tests are approximate using memory only (no storage IO). PBKDF2-sha1 669588 iterations per second for 256-bit key PBKDF2-sha256 1315653 iterations per second for 256-bit key PBKDF2-sha512 485451 iterations per second for 256-bit key PBKDF2-ripemd160 365612 iterations per second for 256-bit key PBKDF2-whirlpool 134847 iterations per second for 256-bit key # Algorithm | Key | Encryption | Decryption aes-cbc 128b 661.7 MiB/s 922.4 MiB/s serpent-cbc 128b N/A N/A twofish-cbc 128b 80.0 MiB/s 81.2 MiB/s aes-cbc 256b 567.6 MiB/s 826.9 MiB/s serpent-cbc 256b N/A N/A twofish-cbc 256b 79.6 MiB/s 81.1 MiB/s aes-xts 256b 736.3 MiB/s 741.3 MiB/s serpent-xts 256b N/A N/A twofish-xts 256b 83.7 MiB/s 82.5 MiB/s aes-xts 512b 683.7 MiB/s 686.0 MiB/s serpent-xts 512b N/A N/A twofish-xts 512b 83.7 MiB/s 82.5 MiB/s When killing the big cores it looks like this (all the time running with performance cpufreq governor): # Tests are approximate using memory only (no storage IO). PBKDF2-sha1 332670 iterations per second for 256-bit key PBKDF2-sha256 623410 iterations per second for 256-bit key PBKDF2-sha512 253034 iterations per second for 256-bit key PBKDF2-ripemd160 193607 iterations per second for 256-bit key PBKDF2-whirlpool 85556 iterations per second for 256-bit key # Algorithm | Key | Encryption | Decryption aes-cbc 128b 369.9 MiB/s 449.0 MiB/s serpent-cbc 128b N/A N/A twofish-cbc 128b 33.5 MiB/s 35.1 MiB/s aes-cbc 256b 323.9 MiB/s 414.7 MiB/s serpent-cbc 256b N/A N/A twofish-cbc 256b 33.5 MiB/s 35.1 MiB/s aes-xts 256b 408.4 MiB/s 408.7 MiB/s serpent-xts 256b N/A N/A twofish-xts 256b 36.1 MiB/s 36.4 MiB/s aes-xts 512b 376.6 MiB/s 377.3 MiB/s serpent-xts 512b N/A N/A twofish-xts 512b 35.9 MiB/s 36.3 MiB/s Other information as requested: https://pastebin.com/hMhKUStN
  13. Cpuminer test (heavy NEON optimizations) And another test: sudo apt install automake autoconf pkg-config libcurl4-openssl-dev libjansson-dev libssl-dev libgmp-dev make g++ git clone https://github.com/tkinjo1985/cpuminer-multi.git cd cpuminer-multi/ ./build.sh ./cpuminer --benchmark When running on all 6 cores this benchmark scores at 'Total: 8.80 kH/s' without throttling. After killing the big cores (echo 0 >/sys/devices/system/cpu/cpu[45]/online) I get scores up to 'Total: 4.69 kH/s' which is the expected value since I got 3.9 kH/s/s on an overclocked A64 (also Cortex-A53, back then running at 1296MHz). And when bringing back the big cores and killing the littles we're at around 'Total: 4.10 kH/s': root@odroid:/usr/local/src/cpuminer-multi# echo 1 >/sys/devices/system/cpu/cpu5/online root@odroid:/usr/local/src/cpuminer-multi# echo 1 >/sys/devices/system/cpu/cpu4/online root@odroid:/usr/local/src/cpuminer-multi# echo 0 >/sys/devices/system/cpu/cpu3/online root@odroid:/usr/local/src/cpuminer-multi# echo 0 >/sys/devices/system/cpu/cpu2/online root@odroid:/usr/local/src/cpuminer-multi# echo 0 >/sys/devices/system/cpu/cpu1/online root@odroid:/usr/local/src/cpuminer-multi# echo 0 >/sys/devices/system/cpu/cpu0/online root@odroid:/usr/local/src/cpuminer-multi# ./cpuminer --benchmark ** cpuminer-multi 1.3.3 by tpruvot@github ** BTC donation address: 1FhDPLPpw18X4srecguG3MxJYe4a1JsZnd (tpruvot) [2018-02-16 10:41:28] 6 miner threads started, using 'scrypt' algorithm. [2018-02-16 10:41:29] CPU #0: 0.54 kH/s [2018-02-16 10:41:29] CPU #5: 0.54 kH/s [2018-02-16 10:41:30] CPU #2: 0.44 kH/s [2018-02-16 10:41:30] CPU #3: 0.45 kH/s [2018-02-16 10:41:30] CPU #1: 0.44 kH/s [2018-02-16 10:41:30] CPU #4: 0.44 kH/s [2018-02-16 10:41:32] Total: 3.90 kH/s [2018-02-16 10:41:33] Total: 3.95 kH/s [2018-02-16 10:41:37] CPU #4: 0.73 kH/s [2018-02-16 10:41:37] CPU #3: 0.65 kH/s [2018-02-16 10:41:38] CPU #1: 0.60 kH/s [2018-02-16 10:41:38] CPU #2: 0.68 kH/s [2018-02-16 10:41:38] CPU #0: 0.59 kH/s [2018-02-16 10:41:38] CPU #5: 0.81 kH/s [2018-02-16 10:41:38] Total: 4.01 kH/s [2018-02-16 10:41:43] CPU #3: 0.66 kH/s [2018-02-16 10:41:43] CPU #4: 0.71 kH/s [2018-02-16 10:41:44] CPU #5: 0.73 kH/s [2018-02-16 10:41:44] Total: 4.10 kH/s [2018-02-16 10:41:47] CPU #0: 0.68 kH/s [2018-02-16 10:41:48] CPU #2: 0.67 kH/s [2018-02-16 10:41:48] Total: 4.08 kH/s [2018-02-16 10:41:48] CPU #1: 0.68 kH/s [2018-02-16 10:41:53] CPU #3: 0.68 kH/s [2018-02-16 10:41:53] CPU #5: 0.72 kH/s [2018-02-16 10:41:53] Total: 4.13 kH/s [2018-02-16 10:41:53] CPU #4: 0.68 kH/s [2018-02-16 10:41:54] CPU #1: 0.65 kH/s [2018-02-16 10:41:54] CPU #0: 0.68 kH/s [2018-02-16 10:41:58] Total: 4.05 kH/s [2018-02-16 10:41:58] CPU #2: 0.65 kH/s [2018-02-16 10:42:03] CPU #1: 0.64 kH/s [2018-02-16 10:42:03] CPU #3: 0.66 kH/s [2018-02-16 10:42:03] CPU #0: 0.65 kH/s [2018-02-16 10:42:03] CPU #5: 0.73 kH/s [2018-02-16 10:42:03] Total: 4.02 kH/s [2018-02-16 10:42:03] CPU #4: 0.71 kH/s ^C[2018-02-16 10:42:05] SIGINT received, exiting With ODROID-XU4/HC1/HC2 it looks like this: When forced to run on the little cores cpuminer gets 2.43 khash/s (no throttling occuring), running on the big cores it starts with 8.2 khash/s at 2.0GHz but even with the fansink on XU4 immediately cpufreq drops down to 1.8 or even 1.6 GHz. At least that's what happens on my systems, maybe others have seen other behaviour. Let's do a 'per core' comparison: A15 @ 2.0GHz: 2.35 khash/sec A72 @ 2.0GHz: 2.05 khash/sec A7 @ 1.5 GHz: 0.61 khash/sec A53 @ 1.5 GHz: 1.18 khash/sec In other words: with such or similar workloads ('number crunching', NEON optimized stuff) an A15 core might be slightly faster than an A72 core (and since Exynos has twice as much fast cores it performs better with such workloads) while there's a great improvement when looking at the little cores: A53 performs almost twice as fast as an A7 at same clockspeed but this is due to this specific benchmark making heavy use of NEON instructions and there switching to 64-bit/ARMv8 ISA makes a huge difference. Please be also aware that cpuminer is heavily dependent on memory bandwidth so that these cpuminer numbers are not a good representation for other workloads. This is just 'number cruncher' stuff where NEON can be used.
  14. 7-zip Running on all 6 cores in parallel (no throttling occured, I start to like the small fansink ) root@odroid:/tmp# 7zr b 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,6 CPUs LE) LE CPU Freq: 401 400 401 1414 1985 1985 1985 1984 1985 RAM size: 3882 MB, # CPU hardware threads: 6 RAM usage: 1323 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 4791 499 934 4661 | 100897 522 1647 8605 23: 4375 477 935 4458 | 98416 522 1631 8516 24: 4452 524 914 4787 | 95910 523 1610 8418 25: 4192 524 914 4787 | 92794 523 1579 8258 ---------------------------------- | ------------------------------ Avr: 506 924 4673 | 523 1617 8449 Tot: 514 1270 6561 Now still 6 threads but pinned only to the little cores: root@odroid:/tmp# taskset -c 0,1,2,3 7zr b 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,6 CPUs LE) LE CPU Freq: 1492 1500 1499 1500 1499 1493 1498 1499 1499 RAM size: 3882 MB, # CPU hardware threads: 6 RAM usage: 1323 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 2475 375 642 2408 | 64507 396 1387 5501 23: 2440 385 646 2487 | 60795 383 1374 5261 24: 2361 391 649 2539 | 58922 381 1359 5172 25: 2249 394 652 2568 | 58033 388 1332 5165 ---------------------------------- | ------------------------------ Avr: 386 647 2501 | 387 1363 5275 Tot: 387 1005 3888 And now 6 threads but bound to the A72 cores: root@odroid:/tmp# taskset -c 4,5 7zr b 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,6 CPUs LE) LE CPU Freq: 400 401 498 1984 1985 1981 1985 1985 1985 RAM size: 3882 MB, # CPU hardware threads: 6 RAM usage: 1323 MB, # Benchmark threads: 6 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 2790 199 1364 2715 | 47828 200 2040 4079 23: 2630 199 1343 2680 | 46641 200 2020 4036 24: 2495 200 1344 2683 | 45505 200 1999 3994 25: 2366 200 1353 2702 | 43998 200 1959 3916 ---------------------------------- | ------------------------------ Avr: 199 1351 2695 | 200 2005 4006 Tot: 200 1678 3350
  15. root@odroid:/mnt/MDRAID0# grep dpll_ddr /sys/kernel/debug/clk/clk_summary root@odroid:/mnt/MDRAID0# cat /sys/kernel/debug/clk/clk_summary | curl -F 'f:1=<-' http://ix.io http://ix.io/KNS
  16. IO scheduler influence on SATA performance I tried to add the usual Armbian tweaks to Hardkernel's Debian Stretch image but something went wrong (we usually set cfq for HDDs and noop for flash media from /etc/init.d/armhwinfo -- I simply forgot to load the script so it never got executed at boot): root@odroid:/home/odroid# cat /sys/block/sd*/queue/scheduler noop deadline [cfq] noop deadline [cfq] So let's use the mdraid0 made of EVO840 and EVO750 (to ensure parallel disk accesses) with an ext4 on top and check for NCQ issues first: root@odroid:/mnt/MDRAID0# dmesg | grep -i ncq [ 2.007269] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs [ 2.536884] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [ 2.536897] ata1.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 2.537652] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [ 3.011571] ata2.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 31/32), AA No issues, we can use NCQ with maximum queue depth so let's test through the three available schedulers with performance cpufreq governor to avoid being influenced by cpufreq scaling behaviour: cfq random random kB reclen write rewrite read reread read write 102400 1 7320 7911 8657 8695 5954 8106 102400 4 25883 30470 33159 33169 23205 30464 102400 16 85609 96712 101527 102396 77224 96583 102400 512 311645 312376 301644 303945 289410 308194 102400 1024 345891 338773 329284 330738 329926 332866 102400 16384 382101 379907 383779 387747 386901 383664 deadline random random kB reclen write rewrite read reread read write 102400 1 6963 8307 8211 8402 5772 8483 102400 4 24701 30999 34728 34653 23160 31728 102400 16 87390 98898 105589 97539 78259 97638 102400 512 306420 304645 298131 302033 286582 303119 102400 1024 345178 345458 329122 333318 329688 340144 102400 16384 381596 374789 383850 387551 386428 381956 noop random random kB reclen write rewrite read reread read write 102400 1 6995 8589 9340 8498 5763 8246 102400 4 26011 31307 30267 32635 21445 30859 102400 16 88185 100135 97252 105090 79601 91052 102400 512 307553 312609 304311 307922 291425 308387 102400 1024 344472 340192 322881 333104 332405 333082 102400 16384 372224 373183 380530 386994 386273 379506 Well, this looks like result varation but of course someone interested in this could do a real benchmark testing with each scheduler at least 30 times and then generating average values. In the past on slower ARM boards with horribly bottlenecked IO capabilities (think about those USB2 only boards that do not even can use USB Attached SCSI due to lacking kernel/driver support) we've seen some severe performance impact based on IO scheduler used but in this situation this seems negligible. If someone takes his time to benchmark through this it would be interesting to repeat the tests also with ondemand governor, io_is_busy set 1 of course and then playing around with different values for up_threshold and sampling_down_factor since if cpufreq scaling behaviour starts to vary based in IO scheduler used performance differences can be massive. I just did a quick check how performance with ondemand cpufreq governor and the iozone benchmark varies between Stretch / Hardkernel defaults and our usual tweaks: https://github.com/armbian/build/blob/751aa7194f77eabcb41b19b8d19f17f6ea23272a/packages/bsp/common/etc/init.d/armhwinfo#L82-L94 Makes quite a difference but again the IO scheduler chosen still doesn't matter that much (but adjusting io_is_busy, up_threshold and sampling_down_factor does): cfq defaults random random kB reclen write rewrite read reread read write 102400 1 5965 6656 7197 7173 5107 6586 102400 4 20864 24899 27205 27214 19421 24595 102400 16 68376 79415 85409 85930 66138 77598 102400 512 274000 268473 267356 269046 247424 272822 102400 1024 310992 314672 299571 299065 298518 315823 102400 16384 366152 376293 375176 379202 379123 370254 cfq with Armbian settings random random kB reclen write rewrite read reread read write 102400 1 7145 7871 8600 8591 5996 7973 102400 4 25817 29773 32174 32385 23021 29627 102400 16 83848 94665 98502 98857 75576 93879 102400 512 303710 314778 303135 309050 280823 300391 102400 1024 335067 332595 327539 332574 323887 329956 102400 16384 381987 373067 381911 386585 387089 381956 deadline defaults random random kB reclen write rewrite read reread read write 102400 1 6231 6872 7750 7746 5410 6804 102400 4 21792 25941 28752 28701 20262 25380 102400 16 70078 84209 88703 87375 69296 80708 102400 512 276422 276042 259416 271542 250835 271743 102400 1024 305166 321265 300374 296094 311020 323350 102400 16384 363016 373751 376570 377294 378730 377186 deadline with Armbian settings random random kB reclen write rewrite read reread read write 102400 1 7389 8018 9018 9047 6162 8233 102400 4 26526 30799 33487 33603 23712 30838 102400 16 85703 96066 105055 103831 77281 97086 102400 512 302688 297832 292569 288282 278384 294447 102400 1024 343165 340770 317211 320999 329411 330670 102400 16384 380267 375233 388286 390289 391849 375236 noop defaults random random kB reclen write rewrite read reread read write 102400 1 6301 6900 7766 7779 5350 6841 102400 4 21995 25884 28466 28540 20240 25664 102400 16 69547 81721 88044 88596 68043 81277 102400 512 281386 276749 262216 262762 255387 261948 102400 1024 300716 314233 288672 298921 310456 307875 102400 16384 376137 371625 376620 378136 379143 371308 noop with Armbian settings random random kB reclen write rewrite read reread read write 102400 1 7409 8026 9030 9033 6193 8259 102400 4 26562 30861 33494 33649 23676 30870 102400 16 85819 96956 102372 101982 77890 97341 102400 512 310007 303370 293432 297090 281048 301772 102400 1024 330968 352003 328052 318009 333682 337339 102400 16384 373958 375028 384865 386749 389401 376501 (but as already said: to get more insights each test has to be repeated at least 30 times and then average values need to be generated -- 'single shot' benchmarking is useless to generate meaningful numbers)
  17. Correct: root@odroid:/home/odroid# cat /sys/devices/virtual/thermal/cooling_device0/type pwm-fan So everything I've written above about cooling state is BS since it's just showing the fansink starting to work One should stop to think while benchmarking. Just collect numbers like a robot, check the data later whether it makes sense, throw numbers away and test again and again and again. Fortunately I already figured out that the result variation with openssl on the A72 cores has a different reason. But whether these benchmark numbers tell something is questionable. It would need some real-world tests with VPN and full disk encryption and then trying to pin the tasks to a little or a big core to get the idea what's really going on and whether the numbers generated with a synthetic benchmark have any meaning for real tasks.
  18. Openssl and thermal update I've been wrong before wrt 'cooling state 1' -- the result variation must had a different reason before. I decided to test with AES encryption running on all 6 CPU cores in parallel using a simple script testing only with AES-256: root@odroid:/home/odroid# cat check-aes.sh #!/bin/bash while true; do for i in 0 1 2 3 4 5 ; do taskset -c ${i} openssl speed -elapsed -evp aes-256-cbc 2>/dev/null & done wait done Results as follows: https://pastebin.com/fHzJ5tJF (please note that cpufreq governor was set to performance and how especially A72 scores were lower in the beginning just to improve over time: with 16 byte it were 309981.41k in the beginning and then later 343045.14k and even slightly more) Here armbianmonitor output: https://pastebin.com/1hsmk63i (at '07:07:28' I stopped the huge 5V fan and the small fansink can cope with this load though cooling state 2 is sometimes reached when SoC temperature exceeds 55°C). So for whatever reasons we still have a somewhat huge result variation with this single benchmark which needs further investigation (especially whether benchmark behaviour relates to real-world use cases like VPN and full disk encryption)
  19. AES crypto performance, checking for bogus clockspeeds, thermal tresholds As Armbian user you already might know that almost all currently available 64 bit ARM SoCs licensed ARM's ARMv8 crypto extensions and that AES performance especially with small data chunks (think about VPN encryption) is something where A72 cores shine: https://forum.armbian.com/topic/4583-rock64/?do=findComment&comment=37829 (the only two exceptions are Raspberry Pi 3 and ODROID-C2 where the SoC makers 'forgot' to license the ARMv8 crypto extensions) Let's have a look at ODROID N1 and A53@1.5GHz vs. A72@2GHz. I use the usual openssl benchmark that runs in a single thread. Once pinned to cpu1 (little core) and another time pinned to cpu5 (big core): for i in 128 192 256 ; do taskset -c 1 openssl speed -elapsed -evp aes-${i}-cbc 2>/dev/null; done | grep cbc for i in 128 192 256 ; do taskset -c 5 openssl speed -elapsed -evp aes-${i}-cbc 2>/dev/null; done | grep cbc As usual monitoring happened in another shell and when testing on the A72 I not only got a huge result variation but armbianmonitor also reported 'cooling state' reaching 1 already -- see last column 'C.St.' (nope, that's the PWM fan, see few posts below) Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 06:00:44: 1992/1512MHz 0.46 16% 0% 16% 0% 0% 0% 51.1°C 1/3 So I added a huge and silent USB powered 5V fan to the setup blowing air over the board at an 45° angle to improve heat dissipation a bit (I hate those small and inefficient fansinks like the one on XU4 and the N1 sample now) and tried again. This time cooling state remained at 0 the internal fan did not start and we had no result variation any more (standard deviation low enough between multiple runs): Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 06:07:03: 1992/1512MHz 0.46 0% 0% 0% 0% 0% 0% 30.0°C 0/3 06:07:08: 1992/1512MHz 0.42 0% 0% 0% 0% 0% 0% 30.0°C 0/3 06:07:13: 1992/1512MHz 0.39 0% 0% 0% 0% 0% 0% 30.0°C 0/3 06:07:18: 1992/1512MHz 0.36 0% 0% 0% 0% 0% 0% 30.0°C 0/3 06:07:23: 1992/1512MHz 0.33 0% 0% 0% 0% 0% 0% 30.0°C 0/3 06:07:28: 1992/1512MHz 0.38 12% 0% 12% 0% 0% 0% 32.2°C 0/3 06:07:33: 1992/1512MHz 0.43 16% 0% 16% 0% 0% 0% 32.2°C 0/3 06:07:38: 1992/1512MHz 0.48 16% 0% 16% 0% 0% 0% 32.8°C 0/3 06:07:43: 1992/1512MHz 0.52 16% 0% 16% 0% 0% 0% 33.9°C 0/3 06:07:48: 1992/1512MHz 0.56 16% 0% 16% 0% 0% 0% 33.9°C 0/3 06:07:53: 1992/1512MHz 0.60 16% 0% 16% 0% 0% 0% 33.9°C 0/3 06:07:58: 1992/1512MHz 0.63 16% 0% 16% 0% 0% 0% 34.4°C 0/3 06:08:04: 1992/1512MHz 0.66 16% 0% 16% 0% 0% 0% 34.4°C 0/3 06:08:09: 1992/1512MHz 0.69 16% 0% 16% 0% 0% 0% 34.4°C 0/3 06:08:14: 1992/1512MHz 0.71 16% 0% 16% 0% 0% 0% 35.0°C 0/3 So these are the single threaded PRELIMINARY openssl results for ODROID N1 differentiating between A53 and A72 cores: A53 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 103354.37k 326225.96k 683938.47k 979512.32k 1119100.93k aes-192-cbc 98776.57k 293354.45k 565838.51k 760103.94k 843434.67k aes-256-cbc 96389.62k 273205.14k 495712.34k 638675.29k 696685.91k A72 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 377879.56k 864100.25k 1267985.24k 1412154.03k 1489756.16k aes-192-cbc 317481.96k 779417.49k 1045567.57k 1240775.00k 1306637.65k aes-256-cbc 270982.47k 663337.94k 963150.93k 1062750.21k 1122691.75k The numbers look somewhat nice but need further investigation: When we compared with other A53 and especially A72 SoCs a while ago (especially the A72 numbers made on a RK3399 TV box only clocking at 1.8 GHz) the A72 scores above seem to low with all test sizes (see the numbers here with AES-128 on a H96-Pro) Cooling state 1 is entered pretty early (when zone0 exceeds already 50°C) -- this needs further investigation. And further benchmarking especially with multiple threads in parallel is useless until this is resolved/understood So let's check with Willy Tarreau's 'mhz' tool whether the CPU clockspeeds reported are bogus (I'm still using performance cpufreq governor so should run with 2 and 1.5 GHz on A72 and A53 cores): root@odroid:/home/odroid/mhz# taskset -c 1 ./mhz count=645643 us50=21495 us250=107479 diff=85984 cpu_MHz=1501.775 root@odroid:/home/odroid/mhz# taskset -c 5 ./mhz count=807053 us50=20330 us250=101641 diff=81311 cpu_MHz=1985.102 All fine so we need to have a look at memory bandwidth. Here are tinymembench numbers pinned to an A53 and here with an A72. As a reference some numbers made with other RK3399 devices few days ago on request: https://irclog.whitequark.org/linux-rockchip/2018-02-12#21298744; One interesting observation is throttling behaviour in a special SoC engine affecting crypto. When cooling state 1 was reached the cpufreq still remained at 2 and 1.5 GHz respectively but AES performance dropped a lot. So the ARMv8 crypto engine is part of BSP 4.4 kernel throttling strategies and performance in such a case does not scale linearly with repored cpufreq. In other words: for the next round of tests the thermal tresholds defined in DT should be lifted a lot. Edit: Wrong assumption wrt openssl numbers on A72 cores -- see next post
  20. Gigabit Ethernet performance RK3399 has an internal GbE MAC implementation combined with an external RTL8211 GbE PHY. I did only some quick tests which were well above 900 Mbits/sec but since moving IRQs to one of the A72 cores didn't improve scores it's either my current networking setup (ODROID N1 connected directly to an older GbE switch I don't trust that much any more) or necessary TX/RX delay adjustments. Anyway: the whole process should be well known and is documented so it's time for someone else to look into. With RK SoCs it's pretty easy to test for this with DT overlays: https://github.com/ayufan-rock64/linux-build/blob/master/recipes/gmac-delays-test/range-test And the final result might be some slight DT modifications that allow for 940 Mbits/sec in both directions with as less CPU utilization as possible. Example for RK3328/ROCK64: https://github.com/ayufan-rock64/linux-kernel/commit/2047dd881db53c15a952b1755285e817985fd556 Since RK3399 uses the same Synopsys Designware Ethernet implementation as currently almost every other GbE capable ARM SoC around and since we get maximum throughput on RK3328 with adjusted settings... I'm pretty confident that this will be the same on RK3399.
  21. BTW: Since checking out a new board without some kind of monitoring is just stupid... here's what it takes to get armbianmonitor to run with Hardkernel's Stretch (or Ubuntu later -- the needed RK3399 tweaks have been added long ago). mkdir -p /etc/armbianmonitor/datasources cd /etc/armbianmonitor/datasources ln -s /sys/devices/virtual/thermal/thermal_zone0/temp soctemp wget https://raw.githubusercontent.com/armbian/build/master/packages/bsp/common/usr/bin/armbianmonitor mv armbianmonitor /usr/local/sbin/ chmod 755 /usr/local/sbin/armbianmonitor Then it's just calling 'sudo armbianmonitor -m' to get a clue what's going on (throttling, big.LITTLE stuff, %iowait... everything included): root@odroid:/home/odroid# armbianmonitor -m Stop monitoring using [ctrl]-[c] Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 23:14:25: 408/1200MHz 0.38 6% 2% 3% 0% 0% 0% 43.9°C 0/3 23:14:30: 408/ 408MHz 0.51 1% 0% 0% 0% 0% 0% 43.9°C 0/3 23:14:35: 600/ 408MHz 0.55 2% 0% 1% 0% 0% 0% 44.4°C 0/3 23:14:41: 408/ 408MHz 0.51 0% 0% 0% 0% 0% 0% 46.9°C 1/3 23:14:46: 1992/ 816MHz 0.63 33% 0% 33% 0% 0% 0% 52.8°C 1/3 23:14:51: 408/ 408MHz 0.74 16% 0% 16% 0% 0% 0% 42.8°C 0/3 23:14:56: 1992/ 600MHz 0.68 5% 4% 0% 0% 0% 0% 44.4°C 0/3 23:15:01: 600/1008MHz 0.86 45% 8% 0% 0% 36% 0% 42.8°C 0/3 23:15:07: 408/ 408MHz 0.95 19% 2% 0% 0% 16% 0% 42.8°C 0/3 23:15:12: 408/ 600MHz 1.04 23% 2% 0% 0% 20% 0% 43.3°C 0/3 23:15:17: 1200/ 600MHz 1.12 18% 4% 0% 0% 14% 0% 43.9°C 0/3 23:15:22: 1992/1512MHz 1.03 51% 18% 23% 0% 8% 0% 52.8°C 1/3 23:15:27: 1992/1512MHz 1.42 88% 20% 34% 0% 32% 0% 51.1°C 1/3 23:15:32: 1992/1512MHz 1.79 72% 16% 34% 0% 20% 0% 51.7°C 1/3 Time big.LITTLE load %cpu %sys %usr %nice %io %irq CPU C.St. 23:15:37: 1992/1512MHz 2.05 77% 16% 34% 0% 26% 0% 50.0°C 1/3 23:15:42: 1992/1512MHz 2.29 79% 21% 34% 0% 23% 0% 50.0°C 1/3 23:15:47: 1992/1512MHz 2.42 85% 24% 34% 0% 26% 0% 48.8°C 1/3 23:15:52: 408/ 408MHz 2.71 50% 8% 11% 0% 29% 0% 40.6°C 0/3 23:15:57: 408/ 816MHz 2.65 33% 2% 0% 0% 30% 0% 40.6°C 0/3 23:16:03: 1008/ 600MHz 2.60 18% 4% 0% 0% 14% 0% 40.6°C 0/3 23:16:08: 408/ 408MHz 2.79 3% 0% 0% 0% 2% 0% 40.6°C 0/3^C root@odroid:/home/odroid#
  22. More storage performance: eMMC and SD cards The N1 has not only 2 SATA ports but also the usual SD card slot and also the usual eMMC socket known from other ODROID boards. Hardkernel sells some of the best eMMC modules you can get for this connector and they usually also take care that SD cards can enter higher speed modes. This usually requires switching between 3.3V and 1.8V but at least released schematics for this (early!) board revision do not mention 1.8V here. Hardkernel shipped the dev sample with their new Samsung based orange eMMC (16 GB) but since this is severly limited wrt sequential write performance (as usual, flash memory modules with low capacity always suffer from this problem) we use the 64GB module to show the performance. Since the use case I'm interested in is 'rootfs' or 'OS drive' sequential performance is more or less irrelevant and all that really matters is random IO performance (especially writes at small block sizes). Test setup as before with iozone task sent to cpu5: Orange 64GB eMMC (Samsung): random random kB reclen write rewrite read reread read write 102400 1 2069 1966 8689 8623 7316 2489 102400 4 32464 36340 30699 30474 27776 31799 102400 16 94637 100995 89970 90294 83993 96937 102400 512 147091 151657 278646 278126 269186 146851 102400 1024 143085 148288 287749 291479 275359 143229 102400 16384 147880 149969 306523 306023 307040 147470 If we compare random IOPS at 4K and 16K block size it's as follows (IOPS -- IO operations per second -- means we need to divide the KB/s numbers above through block size!). Below numbers are not KB/s but IOPS instead: 4K read 4K write 16K read 16K write JMS567: 6000 6300 4925 5500 ASM1061 powersave: 5700 7600 4750 6000 16GB eMMC: 7250 7100 5025 2950 32/64/128GB eMMC: 7450 7350 5200 5700 ASM1061 performance: 9200 15050 6625 9825 (Not so) surprisingly Hardkernel's eMMC modules are faster than an SSD with default settings (and we're talking about ok-ish consumer SSDs and not cheap crap). Some important notes: 'JMS567' is the USB3-to-SATA chipset used for my tests. The above is not an 'USB3 number' but one made with a great JMicron chipset and UAS active (UAS == USB Attached SCSI, the basic requirement to get storage performance with USB that does not totally suck). If you don't take care about the chipset you use your USB3 storage performance can be magnitudes lower 'ASM1061' is not a synonym for 'native SATA', it's just PCIe attached SATA and most probably one of the slowest implementations available. There are two numbers above since PCIe power management settings have an influence on both consumption and performance. When /sys/module/pcie_aspm/parameters/policy is set to performance instead of powersave idle consumption increases by around 250mW but performance improves also a lot with small block sizes As a reference here iozone numbers for all orange Samsung based eMMC modules tested on N1 (Hardkernel sent the numbers on request): https://pastebin.com/ePUCXyg6 (as can be seen the 16 GB module already performs great but for full performance better choose one of the larger modules) So what about SD cards? Update: Hardkernel forgot to include an UHS patch to the kernel they provided with the developer samples so once this is fixed the performance bottleneck with SD cards reported below should be gone: https://forum.odroid.com/viewtopic.php?f=153&t=30193#p215915 Update 2: already fixed with those 3 simple lines in device-tree configuration (therefore the below numbers only as 'historical reference' and what happen with slowest SD card speed mode -- for current performance with SDR104 mode see here and there) As Armbian user you already know that 'SD card' is not a performance class but just a form factor and an interface specification. There is all the counterfeit crap, there exist 'reputable brands' that produce SD cards that are slow as hell when it comes to random IO and there are good performers that show even 100 times better random IO performance than eg. an average Kingston or PNY card: https://forum.armbian.com/topic/954-sd-card-performance/ Unfortunately in the past 'random IO' was not part of SD Association's speed classes but this has changed last year. In the meantime there's 'A1 speed class' which specifies minimum random IO performance and now these cards even exist. I tried to buy a SanDisk Extreme Plus A1 but was too stupid and ordered a SanDisk Extreme A1 instead (without the 'Plus' which means extra performance and especially extra reliability). But since I saved few bucks by accident and there was an 'SanDisk Ultra A1' offer... I bought two A1 cards today: Fresh SanDisk Extreme A1 32GB SD card: random random kB reclen write rewrite read reread read write 102400 1 998 716 4001 3997 3049 740 102400 4 3383 3455 10413 10435 9631 4156 102400 16 8560 8607 17149 17159 17089 11949 102400 512 21199 21399 22447 22457 22464 20571 102400 1024 22075 22168 22912 22922 22919 21742 102400 16384 22415 22417 23357 23372 23372 22460 Fresh SanDisk Ultra A1 32GB SD card: random random kB reclen write rewrite read reread read write 102400 1 683 718 3466 3467 2966 449 102400 4 2788 3918 9821 9805 8763 2713 102400 16 4212 7950 16577 16627 15765 7121 102400 512 10069 14514 22301 22346 22253 13652 102400 1024 14259 14489 22851 22892 22868 13664 102400 16384 15254 14597 23262 23342 23340 14312 Slightly used SanDisk Extreme Plus (NO A1!) 16GB SD card: random random kB reclen write rewrite read reread read write 102400 1 614 679 3245 3245 2898 561 102400 4 2225 2889 9367 9360 7820 2765 102400 16 8202 8523 16836 16806 16807 7507 102400 512 20545 21797 22429 22465 22485 21857 102400 1024 22352 22302 22903 22928 22918 22125 102400 16384 22756 22748 23292 23323 23325 22691 Oh well, performance is limited to slowest SD card mode possible (4 bit, 50 MHz --> ~23 MB/s max) which also affects random IO performance slightly (small blocksizes) to severely (large blocksizes). At least the N1 dev samples have a problem here. No idea whether this is a hardware limitation (no switching to 1.8V?) or just a settings problem. But I really hope Hardkernel addresses this since in the past I always enjoyed great performance with SD cards on the ODROIDs (due to Hardkernel being one of the few board makers taking care of such details)
  23. SATA performance As already said RK3399 is not SATA capable so we're talking here in reality about RK3399 PCIe performance and the performance of the SATA controller Hardkernel chose (ASM1061). I've destroyed the RAID-0 array from before, attached EVO 750 to SATA port 1 and EVO840 to SATA port 2 (both externally powered) so let's test (same settings as before: IRQ affinitiy and sending iozone to cpu5): EVO750 connected to SATA port 1 (ata1.00) random random kB reclen write rewrite read reread read write 102400 1 7483 8366 8990 8997 5985 8320 102400 4 26895 31233 33467 33536 22688 31074 102400 16 87658 98748 103510 103772 75473 98533 102400 512 319330 320934 309735 311915 283113 322654 102400 1024 332979 338408 321312 321328 306621 336457 102400 16384 343053 346736 325660 327009 318830 341269 EVO840 connected to SATA port 2 (ata2.00) random random kB reclen write rewrite read reread read write 102400 1 7282 8225 9004 8639 5540 7857 102400 4 25295 29532 31754 32422 22069 30526 102400 16 85907 97049 102244 102615 77170 96130 102400 512 308776 312344 305041 308835 299016 306654 102400 1024 326341 327747 316543 321559 315103 321031 102400 16384 365294 378264 385631 391119 390479 293734 If we compare with the USB3 numbers above we see clearly one of the many 'benchmarking gone wrong' occurences. How on earth is the EVO750 connected via USB3 faster than when accessed through SATA (look at the sequential performance with 512K, 1M and 16M blocksizes. With USB3 we exceeded 380 MB/s read and are now stuck at ~325 MB/s -- that's impossible?!). Reason is pretty simple: after I destroyed the RAID0 I recreated the filesystems on both SSDs and mkfs.ext4 took ages. Looking at dmesg shows the problem: [ 874.771379] ata1.00: NCQ disabled due to excessive errors Both SSDs got initialized with NCQ (native command queueing and maximum queue depth of 31: [ 2.498063] ata1.00: ATA-9: Samsung SSD 750 EVO 120GB, MAT01B6Q, max UDMA/133 [ 2.498070] ata1.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 31/32), AA [ 2.964660] ata2.00: ATA-9: Samsung SSD 840 EVO 120GB, EXT0BB0Q, max UDMA/133 [ 2.964666] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA But then there were transmission errors and the kernel decided to give up on NCQ which is responsible for trashing SATA performance. When I attached the SATA cables to N1 I already expected troubles (one of the two connections felt somewhat 'loose') so looking into dmesg output was mandatory: http://ix.io/Kzf Ok, shutting down the board and exchanging the SSDs so that now EVO840 is on port 1 and EVO750 on port 2: EVO750 connected to SATA port 2 (ata2.00) random random kB reclen write rewrite read reread read write 102400 1 7479 8257 8996 8997 5972 8305 102400 4 26859 31206 33540 33580 22719 31026 102400 16 87690 98865 103442 103715 75507 98374 102400 512 319251 323358 308725 311769 283398 320156 102400 1024 333172 338362 318633 322155 304734 332370 102400 16384 379016 386131 387834 391267 389064 387225 EVO840 connected to SATA port 1 (ata1.00) random random kB reclen write rewrite read reread read write 102400 1 7350 8238 8921 8925 5627 8167 102400 4 26169 30599 33183 33313 22879 30418 102400 16 85579 96564 102667 100994 76254 95562 102400 512 312950 312802 309188 311725 303605 314411 102400 1024 325669 324499 319510 321793 316649 324817 102400 16384 373322 372417 385662 390987 390181 372922 Now performance is as expected (and with ASM1061 you can't expect more -- 390 MB/s sequential transfer speeds can be considered really great). But still... both SSDs seem to perform identically which is just weird since EVO840 is the much faster one. So let's have a look at a native SATA implementation of another ARM board: the Clearfog Pro. With same EVO840, partially crappy settings (and not testing 1K block size) it looks like this -- random IO of course way better than compared to ASM1061: Clearfog Pro with EVO840 connected to a native SATA port of the ARMADA 385: random random kB reclen write rewrite read reread read write 102400 4 69959 104711 113108 113920 40591 76737 102400 16 166789 174407 172029 215341 123020 159731 102400 512 286833 344871 353944 304479 263423 269149 102400 1024 267743 269565 286443 361535 353766 351175 102400 16384 347347 327456 353394 389994 425475 379687 (you find all details here. On a side note: the Clearfog Pro can be configured to provide 3 native SATA ports and Solid-Run engineers tested with 3 fast SATA SSDs in parallel and were able to exceed 1,500 MB/s in total. That was in early 2016) So now that we have both SSDs running with NCQ and maximum queue depth let's try again RAID0: random random kB reclen write rewrite read reread read write 102400 1 7082 7595 8545 8552 5593 7884 102400 4 25434 29603 31858 31831 21195 29381 102400 16 83270 93265 97376 97138 70859 93365 102400 512 303983 297795 300294 286355 277441 301486 102400 1024 330594 320820 316379 313175 314558 332272 102400 16384 367334 367674 351361 366017 364117 351142 Nope, performance sucks. And the reason is the same. New dmesg output reveals that still SATA port 1 has a problem, so now the EVO840 runs with no NCQ any more so performance has to drop: http://ix.io/KA6 Carefully exchanging cables and checking contacts and another run with the SATA RAID0: random random kB reclen write rewrite read reread read write 102400 1 7363 7990 8897 8901 6113 8176 102400 4 26369 30720 33251 33310 23606 30484 102400 16 85555 97111 102577 102953 78091 96233 102400 512 306039 316729 309768 311106 294009 316353 102400 1024 329348 339153 335685 333575 342699 346854 102400 16384 382487 384749 385321 389949 390039 384479 Now everything fine since we again reach the 390 MB/s. If we look closer at the numbers we see that RAID0 with fast SSDs is just a waste of ressources since ASM1061 is the real bottleneck here. There exists an almost twice as expensive variant called ASM1062 which can make use of 2 PCIe lanes and shows overall better performance. But whether this would really result in higher storage performance is a different question since it could happen that a PCIe device attached with 2 lanes instead of one will bring down the link speed to Gen1 (so zero performance gain) or that there exists an internal SoC bandwidth. Since we can't test for this with the ODROID N1 samples right now we need to do more tests with other RK3399 devices. In the meantime I created one RAID0 out of 4 SSDs (as can be seen in the picture above -- 2 x USB3, 2 x SATA) and let the iozone test repeat: random random kB reclen write rewrite read reread read write 102400 4 25565 29387 33952 33814 19793 28524 102400 16 82857 94170 101870 101376 63274 92038 102400 512 283743 292047 292733 293601 275781 270178 102400 1024 312713 312202 311117 311408 275342 320691 102400 16384 469131 458924 616917 652571 619976 454828 We can see clearly that RAID0 is working (see the increased numbers with small blocksizes) but obviously there's an overall bandwidth limitation. As already said the SSDs I test with are cheap and crappy so the write limitation is caused by my SSDs while the read limitation seems some sort of a bandwidth bottleneck on the board or SoC (or kernel/drivers or current settings used!). Repeated the test with a new RAID0 made out of the two fastest SSDs, one connected via USB3, the other via SATA and now PCIe power management settings set to performance (search for /sys/module/pcie_aspm/parameters/policy below): random random kB reclen write rewrite read reread read write 102400 4 33296 40390 50845 51146 31154 39931 102400 16 105127 120863 139497 140849 97505 120296 102400 512 315177 319535 302748 308408 294243 317566 102400 1024 529760 569271 561234 570950 546556 555642 102400 16384 688061 708164 736293 754982 753050 711708 When testing with sequential transfers only, large block sizes and 500 MB test size we get 740/755 MB/s write/read. Given there is something like a 'per port group' bandwidth limitation then this is as expected but as already said: this is just a quick try to search for potential bottlenecks and it's way too early to draw any conclusions now. We need a lot more time to look into details. On the bright side: the above numbers are a confirmation that certain use cases like 'NAS box with 4 HDDs' will not be a problem at all (as long as users are willing and able to accept that USB3 SATA with a good and UAS capable SATA bridge is not worse compared to PCIe attached SATA here. HDDs all show crappy random IO performance so all that counts is sequential IO and the current bandwidth limitations of ~400 MB/s for both USB3 ports as well as both SATA ports are perfectly fine. People who want to benefit from ultra fast SSD storage might better look somewhere else.
  24. UPDATE: You'll find a preliminary performance overview at the end of the thread. Click here. This is NOT an ODROID N1 review since it's way too early for this and the following will focus on just a very small amount of use cases the board might be used for: server stuff and everything that focuses on network, IO and internal limitations. If you want the hype instead better join Hardkernel's vendor community over there: https://forum.odroid.com/viewforum.php?f=148 All numbers you find below are PRELIMINARY since it's way too early to benchmark this board. This is just the try to get some baseline numbers to better understand for which use cases the device might be appropriate, where to look further into and which settings might need improvements. Background info first ODROID N1 is based on the Rockchip RK3399 SoC so we know already a lot since RK3399 isn't really new (see Chromebooks, countless TV boxes with this chip and dev boards like Firefly RK3399, ROCK960 and a lot of others... and there will be a lot more devices coming in 2018 like another board from China soon with a M.2 key M slot exposing all PCIe lanes). What we already know is that the SoC is one of Rockchip's 'open source SoCs' so software support is already pretty good and the chip vendor itself actively upstreams software support. We also know RK3399 is not the greatest choice for compiling code (use case bottlenecked by memory bandwidth and only 2 fast cores combined with 4 slow ones, for this use case 4 x A15 or A17 cores perform much better), that ARMv8 crypto extensions are supported (see few posts below), that the SoC performs nicely with Android and 'Desktop Linux' stuff (think about GPU and VPU acceleration). We also know that this SoC has 2 USB3 ports and implements PCIe 2.1 with a four lane interface. But so far we don't know how the internal bottlenecks look like so let's focus on this now. The PCIe 2.1 x4 interface is said to support both Gen1 and Gen2 link speeds (2.5 vs. 5GT/s) but there was recently a change in RK3399 datasheet (downgrade from Gen2 to Gen1) and some mainline kernel patch descriptions seem to indicate that RK3399 is not always able to train for Gen2 link speeds. On ODROID N1 there's a x1 PCIe link used configured as either Gen1 or Gen2 to which a dual-port SATA adapter is connected. The Asmedia ASM1061 was the obvious choice since while being a somewhat old design (AFAIK from 2010) it's cheap and 'fast enough' at least when combined with one or even two HDD. Since the PCIe implementation on this early N1 dev samples is fixed and limited we need to choose other RK3399 devices to get a clue about PCIe limitations (RockPro64, ROCK960 or the yet not announced other board from China). So let's focus on SATA and USB3 instead. While SATA on 'development boards' isn't nothing new, it's often done with (sometimes really crappy) USB2 SATA bridges, recently sometimes with good USB3 SATA bridges (see ODROID HC1/HC2, Cloudmedia Transformer or Swiftboard) and sometimes it's even 'true' SATA: Allwinner A10/A20/R40/V40 (many SBC) AM572x Sitara (eg. BeagleBoard-X15 with 1 x eSATA and 1 x SATA on Expansion header) Marvell Armada 38x (Clearfog Base, Clearfog Pro, Helios4) Marvell Armada 37x0 (EspressoBin) NXP i.MX6 (Cubox-i, the various Hummingboard, versions, same with Wandboard and so on) All the above SoC families do 'native SATA' (the SoC itself implements SATA protocols and connectivity) but performance differs a lot with 'Allwinner SATA' being the worst and only the Marvell implementations performing as expected (+500 MB/s sequential and also very high random IO performance which is what you're looking after when using SSDs). As Armbian user you already know: this stuff is documented in detail, just read through this and that. RK3399 is not SATA capable and we're talking here about PCIe attached SATA which has 2 disadvantages: slightly bottlenecking performance while increasing overall consumption. N1's SATA implementation and how it's 'advertised' (rootfs on SATA) pose another challenge but this is something for a later post (the sh*tshow known from 'SD cards' the last years now arriving at a different product category called 'SSD'). Benchmarking storage performance is challenging and most 'reviews' done on SBCs use inappropriate tools (see this nice bonnie/bonnie++ example), inappropriate settings (see all those dd and hdparm numbers testing partially filesystems buffers and caches and not storage) or focus only on irrelevant stuff (eg. sequential performance in 'worst case testing mode' only looking at one direction). Some USB3 tests first All SSDs I use for the test are powered externally and not by N1 since I ran more than one time in situations with board powered SSDs that performance dropped a lot when some sorts of underpowering occured. The 2 USB3 enclosures above are powered by a separate 5V rail and the SATA attached SSDs by the dual-voltage PSU behind. As expected USB3 storage can use the much faster UAS protocol (we know this from RK3328 devices like ROCK64 already which uses same XHCI controller and most probably nearly identical kernel) and also performance numbers match (with large block and file sizes we get close to 400 MB/s). We chose iozone for the simple reason to be able to compare with previous numbers but a more thorough benchmark would need some fio testing with different test sets. But it's only about getting a baseline now. Tests done with Hardkernel's Debian Stretch image with some tweaks applied. The image relies on Rockchip's 4.4 BSP kernel (4.4.112) with some Hardkernel tweaks and I adjusted the following: First set both cpufreq governors to performance to be not affected by potentially wrong/weird cpufreq scaling behaviour. Then do static IRQ distribution for USB3 and PCIe on cpu1, cpu2 and cpu3 (all little cores but while checking CPU utilization none of the cores was fully saturated so A53@1.5GHz is fine): echo 2 >/proc/irq/226/smp_affinity echo 4 >/proc/irq/227/smp_affinity echo 8 >/proc/irq/228/smp_affinity To avoid CPU core collissions the benchmark task itself has been sent to one of the two A72 cores: taskset -c 5 iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Unfortunately currently I've only crappy SSDs lying around (all cheap consumer SSDs: Samsung EVO 840 and 750, a Samsung PM851 and a Intel 540). So we need to take the results with a grain of salt since those SSDs suck especially with continuous write tests (sequential write performance drops down a lot after a short period of time). First test is to determine whether USB3 ports behave differently (AFAIK one of the two could also be configured as an OTG port and with some SBC I've seen serious performance drops in such a mode). But nope, they perform identical: EVO840 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 EVO840 behind JMS567 (UAS active) on upper USB3 port (xhci-hcd:usb5, IRQ 227): random random kB reclen write rewrite read reread read write 102400 1 6195 6545 7383 7383 4816 6518 102400 4 23191 25114 34370 34716 23580 25199 102400 16 78727 86695 104957 106634 76359 87610 102400 512 307469 315243 293077 302678 293442 321779 102400 1024 335772 336833 326940 339128 330298 350271 102400 16384 366465 376863 371193 384503 383297 379898 Now attaching an EVO750 (not that fast) that performs pretty identical behind the XHCI host controller and the JMS567 controller inside the enclosure: EVO750 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 (so USB3 is the bottleneck here, especially with random IO an EVO840 is much much faster than an EVO750 but here they perform identical due to the massive USB protocol overhead) Let's try both USB3 ports at the same time First quick try was a BTRFS RAID-0 made with 'mkfs.btrfs -f -m raid0 -d raid0 /dev/sda1 /dev/sdb1'. Please note that BTRFS is not the best choice here since all (over)writes with blocksizes lower than btrfs' internal blocksize (4K default) are way slower compared to non CoW filesystems: random random kB reclen write rewrite read reread read write 102400 1 2659 1680 189424 621860 435196 1663 102400 4 21943 18762 24206 24034 18107 17505 102400 16 41983 46379 62235 60665 52517 42925 102400 512 180106 170002 143494 149187 138185 180238 102400 1024 170757 185623 159296 156870 156869 179560 102400 16384 231366 247201 340649 351774 353245 231721 That's BS numbers, let's forget about them. Now trying the same with mdraid/ext4 configuring a RAID 0 and putting an ext4 on it and... N1 simply powered down when executing mkfs.ext4. Adding 'coherent_pool=2M' to bootargs seems to do the job (and I created the mdraid0 in between with both SSDs connected through SATA) random random kB reclen write rewrite read reread read write 102400 4 25133 29444 38340 38490 23403 27947 102400 16 85036 97638 113992 114834 79505 95274 102400 512 306492 314124 295266 305411 289393 322493 102400 1024 344588 343012 322018 332545 316320 357040 102400 16384 384689 392707 371415 384741 388054 388908 Seems we're talking here already about one real bottleneck? We see nice improvements with small blocksizes which is an indication that RAID0 is doing its job. But with larger blocksizes we're not able to exceed the 400MB/s barrier so it seems both USB3 ports have to share bandwidth (comparable to the situation on ODROID XU4 where the two USB3 receptacles are connected to an internal USB3 hub which is connected to one USB3 port of the Exynos SoC) Edit: @Xalius used these results to look into RK3399 TRM (technical reference manual). Quoting ROCK64 IRC:
  25. This is a number or a rating. This does not tell anything about 'quality'. On each of the sub-forums there are two topics linked at the top (except here for whatever reasons) and one of those two is this: https://forum.armbian.com/announcement/1-1-check-power-supply-check-sd-card-and-check-other-people-experiences/ While 1A is a bit I low anyway I would also do a web search for 'psu noise ripple filter'.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines