Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Reputation Activity

  1. Like
    tkaiser reacted to zador.blood.stained in Learning from DietPi!   
    I'm starting to regret that we added it in the past, and we need something that works reliably regardless of the daily log size. I have some thoughts, but different ideas have different requirements and expected reliability issues, and anything requires extensive tests.
  2. Like
    tkaiser got a reaction from Dwyt in Learning from DietPi!   
    The nice dashboard screenshot above is used by @Fourdee to explain why DietPi is superiour to Armbian: 'With #DietPi, logs and DietPi scripts are mounted to RAM , this reduces SD card write operations vastly' -- while I don't understand the purpose to 'mount scripts to RAM' of course the idea to cache logs into RAM is great! That's why Armbian does it since 2014 already.
     
    While the above 'proof' is somewhat questionable (watching a 5 min period in a dashboard and once there's activity in one graph taking a screenshot with numbers without meaning) let's look into what makes DietPi that superiour compared to Armbian since it's always a great idea to improve even if that means taking over other project's USPs.
     
    For whatever reasons DietPi dropped support for all Orange and Banana Pis recently (seems this started with a conversation between @Igor and @Fourdee on Twitter, then continued here and ended up there) so I had to take another board to do a direct comparison. The only boards that are supported by both projects are now Pine64, Rock64, Tinkerboard, some NanoPi and the ODROIDs. I chose Rock64 mostly to ensure that we use same kernel and almost same settings (Armbian's philosophy is to fix as much as possible upstream so our usual performance fixes went into ayufan's Rock64 build scripts DietPi in this case is relying on by accident so even DietPi users can continue to benefit from our work  )
     
    I took latest official DietPi image for Rock64 and the first surprise was the rootfs being pretty small and entirely full so no way to proceed:
    /dev/mmcblk1p7 466M 453M 0 100% / For whatever reasons DietPi chose to overtake ayufan's partition layout (for users new to DietPi: this is always just someone else's Debian image processed manually and by some scripts until it becames 'DietPi') but their 'dietpi-drive_manager' responsible to resize the rootfs seems not to be able to cope with this (I wanted to report it to DietPi but there's already a report that gets ignored and it seems I can't comment there).
     
    Edit: Ah, it seems @Fourdee blocked me from helping them entirely. I wanted to assist DietPi folks over at https://github.com/Fourdee/DietPi/issues/1550 but can't point them to fix the thermal issues they're running into again or why it's a bit weird to reintroduce the 'rootmydevice' issue again or why the new Allwinner BSP code is not such a great idea due to non-existing dvfs/thermal support  
     
    Fortunately our scripts below /usr/local/sbin/ were not deleted by DietPi so I simply called /usr/local/sbin/resize_rootfs.sh which instantly resized the rootfs partition and was then able to continue. For whatever reasons it took 3 whole reboots to get DietPi upgraded to their latest version v6.2 but then I was able to do so some measurements:
     
    I then downloaded our Rock64 nightly image (based on Ubuntu Xenial but that doesn't matter that much -- as we all know the userland stuff is close to irrelevant since kernel and settings matter) and did the same thing. But no reboot needed since for whatever reasons DietPi remained on pretty outdated 4.4.77 kernel so I chose to not update Armbian's kernel to our 4.4.115 but to remain at 4.4.77 too:
     
    Let's look at the results leaving aside the various performance and security issues DietPi suffers from since not relevant if we want to look at stuff where DietPi outperforms Armbian. First 'idle behaviour':
    DietPi Armbian DRAM used: 39 MB (2%) 44 MB (2%) processes: 120 134 cpufreq lowest: 97.5% 99.8% cpufreq highest: 2.0% 0.1% idle temp: 46°C 43.5°C %idle percent: 99.95% 99.98% So we're talking more or less about identical numbers. 'Used' memory after booting is 2% of the available 2GB (anyone thinking 'free' RAM would be desirable on Linux... please try to educate yourself: https://www.linuxatemyram.com), the count of processes reported by ps is almost the same, cpufreq behaviour, %idle percentage and temperatures are also the same (DietPi temperature readout is somewhat flawed since their 'cpu' tool affects system behaviour negatively).
     
    Even if Armbian ships with almost twice as much packages installed by default the process count doesn't differ that much (and idling processes really don't hurt anyway) and used memory after booting also doesn't differ significantly. But this 'boot and sit there in idle' use case isn't that relevant anyway and in situations when RAM is really needed I would assume Armbian users are in a much better position since we ship with zram active allowed to use half of the physical DRAM (see here for a brief introduction to zram).
     
    So far I don't see that much advantages (none to be honest) but most probably I missed something?
     
    Anyway: let's continue focussing on storage utilization and 'use':
    DietPi Armbian size img.7z: 104 MB 223 MB (x 2.1) size img: 668 MB 1.6 GB (x 2.5) rootfs size: 457 MB 1.2 GB (x 2.7) packages: 229 436 (x 1.9) commit interval: 5 s 600 s kB_wrtn: 156 KB 448 KB (x 2.9) kB_read: 1008 KB 5912 KB (x 5.9) So both compressed and uncompressed image sizes are much larger with Armbian, same goes for used space on the rootfs which is understandable given that Armbian does not try to be as minimalistic as possible (see the count of pre-installed packages). I don't think going minimalistic is something desirable though we could think about removing development related packages from default installations as @zador.blood.stained suggested already. Maybe it's worth to adjust the rootfs partition size calculation to use slightly less so the uncompressed image size can be a little bit smaller?
     
    Anyway: for people being concerned about smallest image size possible even without leaving out packages from default install simply building an own image and then switching from ext4 to btrfs does the job since reducing image size to around ~60% (one of Armbian's advantages is that our images are not hand-crafted unique 'gems' but the fully automated result of our build system so everyone on this earth can simply build his own Armbian images suiting his own needs).
     
    And besides that I really see no benefit in trying to get the rootfs size smaller since we surely don't want to start to encourage users to write Armbian images to old and crappy SD cards with less than 4GB size (though I already consider 4GB cards nothing anyone should use these days since almost all those cards are insanely slow). Let's better continue to educate our users about the importance to choose good and reliable SD cards!
     
    Now looking at the last 3 lines above. I executed an 'iostat -y 3600' to query the kernel about the total amount of data read and written at the block device layer. within one whole hour With DietPi/Stretch 156KB/1008KB (write/read) were reported and with Armbian/Xenial 448KB/5912KB (write/read). All numbers are too low for further investigations though something is worth a look: that's the default rootfs 'commit interval.' DietPi seems to use ext4 defaults (sync every 5 seconds to SD card) while in Armbian we choose a somewhat high 10 minute value (commit=600).
     
    So while with Armbian and 448 KB written in one hour almost three times as much data has been written at the block device layer it might be possible that the 156 KB written by the DietPi installation caused more wear at the flash layer below due to a phenomenon called Write Amplification (TL;DR version: writes at the flash layer happen at 'page sizes', usually 8K, and by using a high commit interval somewhat larger data chunks will be written only every few minutes which can result in significantly less page writes at the flash layer compared to writing every few seconds smaller chunks of data. Adding to the problem once a card is 'full' now we're talking about much higher Write Amplification since now not just pages are written but usually whole Erase Blocks are affected that are much larger. So please choose your SD card wisely and always use a much larger capacity than needed since there's no TRIM with SD cards in Linux!)
     
    It would need a lot of more detailled analysis about this write behaviour but IMO it's not worth the efforts and Armbian's 10 min commit interval does a great job reducing further SD card wearout (anyone with too much spare time? Grab 'iostat 5' and 'iotop -o -b -d5 -q -t -k | grep -v Total' and start to analyse what's happening at the block device and application layer forgetting about the filesystem layer in between!)
     
    So where's some room for improvement when comparing our defaults with DietPi's?
     
    Maybe removing development related packages from default package list? Maybe tuning rootfs partition creation to use slightly less space? Mostly unrelated but an issue: improving our log2ram behaviour as already discussed?
  3. Like
    tkaiser got a reaction from Tido in Learning from DietPi!   
    The nice dashboard screenshot above is used by @Fourdee to explain why DietPi is superiour to Armbian: 'With #DietPi, logs and DietPi scripts are mounted to RAM , this reduces SD card write operations vastly' -- while I don't understand the purpose to 'mount scripts to RAM' of course the idea to cache logs into RAM is great! That's why Armbian does it since 2014 already.
     
    While the above 'proof' is somewhat questionable (watching a 5 min period in a dashboard and once there's activity in one graph taking a screenshot with numbers without meaning) let's look into what makes DietPi that superiour compared to Armbian since it's always a great idea to improve even if that means taking over other project's USPs.
     
    For whatever reasons DietPi dropped support for all Orange and Banana Pis recently (seems this started with a conversation between @Igor and @Fourdee on Twitter, then continued here and ended up there) so I had to take another board to do a direct comparison. The only boards that are supported by both projects are now Pine64, Rock64, Tinkerboard, some NanoPi and the ODROIDs. I chose Rock64 mostly to ensure that we use same kernel and almost same settings (Armbian's philosophy is to fix as much as possible upstream so our usual performance fixes went into ayufan's Rock64 build scripts DietPi in this case is relying on by accident so even DietPi users can continue to benefit from our work  )
     
    I took latest official DietPi image for Rock64 and the first surprise was the rootfs being pretty small and entirely full so no way to proceed:
    /dev/mmcblk1p7 466M 453M 0 100% / For whatever reasons DietPi chose to overtake ayufan's partition layout (for users new to DietPi: this is always just someone else's Debian image processed manually and by some scripts until it becames 'DietPi') but their 'dietpi-drive_manager' responsible to resize the rootfs seems not to be able to cope with this (I wanted to report it to DietPi but there's already a report that gets ignored and it seems I can't comment there).
     
    Edit: Ah, it seems @Fourdee blocked me from helping them entirely. I wanted to assist DietPi folks over at https://github.com/Fourdee/DietPi/issues/1550 but can't point them to fix the thermal issues they're running into again or why it's a bit weird to reintroduce the 'rootmydevice' issue again or why the new Allwinner BSP code is not such a great idea due to non-existing dvfs/thermal support  
     
    Fortunately our scripts below /usr/local/sbin/ were not deleted by DietPi so I simply called /usr/local/sbin/resize_rootfs.sh which instantly resized the rootfs partition and was then able to continue. For whatever reasons it took 3 whole reboots to get DietPi upgraded to their latest version v6.2 but then I was able to do so some measurements:
     
    I then downloaded our Rock64 nightly image (based on Ubuntu Xenial but that doesn't matter that much -- as we all know the userland stuff is close to irrelevant since kernel and settings matter) and did the same thing. But no reboot needed since for whatever reasons DietPi remained on pretty outdated 4.4.77 kernel so I chose to not update Armbian's kernel to our 4.4.115 but to remain at 4.4.77 too:
     
    Let's look at the results leaving aside the various performance and security issues DietPi suffers from since not relevant if we want to look at stuff where DietPi outperforms Armbian. First 'idle behaviour':
    DietPi Armbian DRAM used: 39 MB (2%) 44 MB (2%) processes: 120 134 cpufreq lowest: 97.5% 99.8% cpufreq highest: 2.0% 0.1% idle temp: 46°C 43.5°C %idle percent: 99.95% 99.98% So we're talking more or less about identical numbers. 'Used' memory after booting is 2% of the available 2GB (anyone thinking 'free' RAM would be desirable on Linux... please try to educate yourself: https://www.linuxatemyram.com), the count of processes reported by ps is almost the same, cpufreq behaviour, %idle percentage and temperatures are also the same (DietPi temperature readout is somewhat flawed since their 'cpu' tool affects system behaviour negatively).
     
    Even if Armbian ships with almost twice as much packages installed by default the process count doesn't differ that much (and idling processes really don't hurt anyway) and used memory after booting also doesn't differ significantly. But this 'boot and sit there in idle' use case isn't that relevant anyway and in situations when RAM is really needed I would assume Armbian users are in a much better position since we ship with zram active allowed to use half of the physical DRAM (see here for a brief introduction to zram).
     
    So far I don't see that much advantages (none to be honest) but most probably I missed something?
     
    Anyway: let's continue focussing on storage utilization and 'use':
    DietPi Armbian size img.7z: 104 MB 223 MB (x 2.1) size img: 668 MB 1.6 GB (x 2.5) rootfs size: 457 MB 1.2 GB (x 2.7) packages: 229 436 (x 1.9) commit interval: 5 s 600 s kB_wrtn: 156 KB 448 KB (x 2.9) kB_read: 1008 KB 5912 KB (x 5.9) So both compressed and uncompressed image sizes are much larger with Armbian, same goes for used space on the rootfs which is understandable given that Armbian does not try to be as minimalistic as possible (see the count of pre-installed packages). I don't think going minimalistic is something desirable though we could think about removing development related packages from default installations as @zador.blood.stained suggested already. Maybe it's worth to adjust the rootfs partition size calculation to use slightly less so the uncompressed image size can be a little bit smaller?
     
    Anyway: for people being concerned about smallest image size possible even without leaving out packages from default install simply building an own image and then switching from ext4 to btrfs does the job since reducing image size to around ~60% (one of Armbian's advantages is that our images are not hand-crafted unique 'gems' but the fully automated result of our build system so everyone on this earth can simply build his own Armbian images suiting his own needs).
     
    And besides that I really see no benefit in trying to get the rootfs size smaller since we surely don't want to start to encourage users to write Armbian images to old and crappy SD cards with less than 4GB size (though I already consider 4GB cards nothing anyone should use these days since almost all those cards are insanely slow). Let's better continue to educate our users about the importance to choose good and reliable SD cards!
     
    Now looking at the last 3 lines above. I executed an 'iostat -y 3600' to query the kernel about the total amount of data read and written at the block device layer. within one whole hour With DietPi/Stretch 156KB/1008KB (write/read) were reported and with Armbian/Xenial 448KB/5912KB (write/read). All numbers are too low for further investigations though something is worth a look: that's the default rootfs 'commit interval.' DietPi seems to use ext4 defaults (sync every 5 seconds to SD card) while in Armbian we choose a somewhat high 10 minute value (commit=600).
     
    So while with Armbian and 448 KB written in one hour almost three times as much data has been written at the block device layer it might be possible that the 156 KB written by the DietPi installation caused more wear at the flash layer below due to a phenomenon called Write Amplification (TL;DR version: writes at the flash layer happen at 'page sizes', usually 8K, and by using a high commit interval somewhat larger data chunks will be written only every few minutes which can result in significantly less page writes at the flash layer compared to writing every few seconds smaller chunks of data. Adding to the problem once a card is 'full' now we're talking about much higher Write Amplification since now not just pages are written but usually whole Erase Blocks are affected that are much larger. So please choose your SD card wisely and always use a much larger capacity than needed since there's no TRIM with SD cards in Linux!)
     
    It would need a lot of more detailled analysis about this write behaviour but IMO it's not worth the efforts and Armbian's 10 min commit interval does a great job reducing further SD card wearout (anyone with too much spare time? Grab 'iostat 5' and 'iotop -o -b -d5 -q -t -k | grep -v Total' and start to analyse what's happening at the block device and application layer forgetting about the filesystem layer in between!)
     
    So where's some room for improvement when comparing our defaults with DietPi's?
     
    Maybe removing development related packages from default package list? Maybe tuning rootfs partition creation to use slightly less space? Mostly unrelated but an issue: improving our log2ram behaviour as already discussed?
  4. Like
    tkaiser got a reaction from Tido in /var easily get full with log2ram   
    Well, it's the distro'ss decision to use this log2ram mechanism and to stay with defaults that in the meantime look somewhat problematic to me.
     
    We're syncing at boot the whole /var/log contents including rotated/compressed logs back into RAM: https://github.com/armbian/build/blob/master/packages/bsp/common/usr/sbin/log2ram#L40-L44
     
    If we would exclude rotated/compressed logs they would exist afterwards only in /var/log.hdd and missing in /var/log which might be confusing for users and break log analysis software and so on. Using symlinks might work but opens up another can of worms.
     
    Isn't there an other option that works somewhat like an overlayfs? @zador.blood.stained IIRC we already talked about this a year ago but I can't remember the details  
     
    Edit: quick google check: https://github.com/azlux/log2ram/commit/e88f67ab23a91bb1482f0f2063b990585b27730c
     
  5. Like
    tkaiser got a reaction from manuti in ODROID N1 -- not a review (yet)   
    USB3 anomalies / problems
     
     
    When I tested this almost 2 weeks ago I did not pay attention close enough to the crappy write performance: 470 MB/s with 4 SSDs in parallel attached to all SATA and USB3 ports is just horribly low given that we have a 'per port' and a 'per port group' limitation of around 390 MB/s. What we should've seen is +650 MB/s taking the overhead into account. But 470 MB/s was already an indication that there's something wrong.
     
    Fortunately in the meantime an ODROID community member tested various mirror attemps with 2 Seagate USB3 disks, reported 'RAID 0 doubles disk IO' while in reality showing exactly the opposite: None of his three mirror attempts (mdraid, lvm and btrfs) reported write performance exceeding 50 MB/s which is insanely low for a RAID0 made out of two 3.5" disks (such lousy numbers are usually not even possible with 2 USB2 disks on separate USB2 ports).
     
    So let's take a look again: EVO840 and EVO750 both in JMS567 enclosures connected to each USB3 port. I simply created an mdraid RAID0 and measured sequential performance with 'taskset -c 5 iozone -e -I -a -s 500M -r 16384k -i 0 -i 1':
    kB reclen write rewrite read reread 512000 16384 85367 85179 312532 315012 Yep, there's something seriously wrong when accessing two USB3 disks in parallel. Only 85 MB/s write and 310 MB/s read is way too low especially for rather fast SSDs. 'iostat 1' output shows that each disk when writing remains at ~83 tps (transactions per second): https://pastebin.com/CvgA3ggQ
     
    Ok, let's try to get a clue what's bottlenecking. I removed the RAID0 and formatted both SSDs as ext4. First tests with only one SSD active at a time:
    kB reclen write rewrite read reread EVO840 512000 16384 378665 382100 388932 392917 EVO750 512000 16384 386473 385902 377608 383549 Now trying to start the iozone runs at the same time (of course iozone tasks sent to different CPU cores to avoid CPU bottlenecks, same applies to IRQs: that's /proc/interrupts after test execution):
    kB reclen write rewrite read reread EVO840 512000 16384 243482 215862 192638 160677 EVO750 512000 16384 214356 252474 168322 195164 So there is still some sort of a limitation but at least it's not as severe as in the mirror modes when all accesses to the two USB connected disks happen exactly in parallel.
     
    When looking closer we see another USB3 problem long known from N1's little sibling ROCK64 (any RK3328 device is a much nearer relative to N1 than any of the other ODROIDs):
    [ 3.433165] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 3.433183] xhci-hcd xhci-hcd.7.auto: @00000000efc59440 00000000 00000000 1b000000 01078001 [ 3.441152] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 3.441171] xhci-hcd xhci-hcd.8.auto: @00000000efc7e440 00000000 00000000 1b000000 01078001 [ 11.363314] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 11.376118] xhci-hcd xhci-hcd.7.auto: @00000000efc59e30 00000000 00000000 1b000000 01078001 [ 11.385567] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 11.395145] xhci-hcd xhci-hcd.8.auto: @00000000efc7ec30 00000000 00000000 1b000000 01078000 [ 465.710783] usb 8-1: new SuperSpeed USB device number 3 using xhci-hcd [ 465.807944] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 465.817503] xhci-hcd xhci-hcd.8.auto: @00000000efc7ea90 00000000 00000000 1b000000 01078001 [ 468.601895] usb 6-1: new SuperSpeed USB device number 3 using xhci-hcd [ 468.671876] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 468.671881] xhci-hcd xhci-hcd.7.auto: @00000000efc591f0 00000000 00000000 1b000000 01078001 I updated bootloader and kernel this morning and have no idea whether this was introduced (again?) just recently or existed already before:
    root@odroid:~# dpkg -l | egrep "odroid|bootini" ii bootini 20180226-8 arm64 boot.ini and its relatives for ODROID-N1 ii linux-odroidn1 4.4.112-16 arm64 Linux kernel for ODROID-N1  
    But I guess we're still talking about a lot of room for improvements when it's about XHCI/USB3, BSP kernel and RK3399
     
    Edit: Strangely when I tested with USB3 when I received the N1 two weeks ago the RAID0 results weren't that low. Now I remembered what happened back then: I immediately discovered coherent pool size being too low and increased that to 2MB (gets removed every time the 'bootini' package will be updated). And guess what: that does the trick. I added 'coherent_pool=2M' to kernel cmdline and we're back at normal performance though there's still a ~390 MB/s overall limitation.
     
  6. Like
    tkaiser got a reaction from manuti in Banana Pi R64   
    Just for the record: Banana people work on another MediaTek based board: https://github.com/BPI-SINOVOIP/BPI-files/commit/a3c53c233fd2059a43763a78b13ca1c5fd0b0f50
     
    SoC is a MT7622A (dual core ARM Cortex A53 processor with some 'dedicated network accelerator', RAID/XOR engine, SATA and PCIe 2.0), latest bootloader commit suggests that the board will be equipped with 802.11ac (AC2600) Wi-Fi.
  7. Like
    tkaiser got a reaction from TonyMac32 in ODROID N1 -- not a review (yet)   
    Storage performance update... what to use to store the rootfs on?
     
    In the following I compare 4 good SD cards with 4 different eMMC modules Hardkernel sells for the N1 with 4 different SSD setups. As some background why I chose to measure random IO with 1k, 4k and 16k block sizes please read the 'SD card performance 2018 update' first.
     
    The following are IOPS numbers (IO operations per second) and important if we want to know how fast storage performs when used as an 'OS drive' (random IO performance is the most important factor here):
    1K w/r 4K w/r 16K w/r SanDisk Extreme Plus 16GB 566 2998 731 2738 557 2037 SanDisk Ultra A1 32GB 456 3171 843 2791 548 1777 SanDisk Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Orange eMMC 16GB 2450 7344 7093 7243 2968 5038 Orange eMMC 32GB 2568 7453 7365 7463 5682 5203 Orange eMMC 64GB 2489 7316 7950 6944 6059 5250 Orange eMMC 128GB 2498 8337 7064 7197 5459 4909 Intel 540 USB3 7076 4732 7053 4785 5342 3294 Samsung EVO750 USB3 8043 6245 7622 5421 6175 4481 Samsung EVO840 powersave 8167 5627 7605 5720 5973 4766 Samsung EVO840 performance 18742 10471 16156 9657 10390 7188 The SD cards I chose for this comparison all perform very well (an average no-name, Kingston, PNY, Verbatim or whatever other 'reputable' brand performs way lower wrt random IO!). But it can be clearly seen that Hardkernel's eMMC modules are a lot more performant. Regardless of size they all perform pretty similar though the small 16GB module being bottlenecked due to a write performance limitation that also affects 16k random IO write performance.
     
    With SSDs it depends: I chose somewhat ok-ish consumer SSDs for the test so in case you want to buy used SSDs or some 'great bargains' on Aliexpress or eBay be prepared that your numbers will look way worse. The SATA connected EVO840 is listed two times since performance with small blocksizes heavily depends on PCIe power management settings (default is powersave -- switching to performance increases idle consumption by around ~250mW but only then a SATA connected SSD is able to outperform Hardkernel's eMMC. That's important to know and also only applies to really performant SSDs. Cheap SSDs especially with small capacities perform way lower)
     
    Now let's look at sequential performance with large blocksizes (something that does NOT represent the 'OS drive' use case even remotely and is pretty irrelevant for almost all use cases except creation of stupid benchmark graphs):
    MB/s write MB/s read SanDisk Extreme Plus 16GB 63 67 SanDisk Ultra A1 32GB 20 66 SanDisk Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Orange eMMC 16GB 48 298 Orange eMMC 32GB 133 252 Orange eMMC 64GB 148 306 Orange eMMC 128GB 148 302 Intel 540 USB3 325 370 Samsung EVO750 USB3 400 395 Samsung EVO840 powersave 375 385 Samsung EVO840 performance 375 385 We can see that N1's SD card interface seems to bottleneck sequential read performance of all tested cards to around ~67 MB/s. Write performance depends mostly on the cards (all cheap cards like the tested SanDisk Ultra A1 32GB you get currently for $12 on Amazon are limited here). The Hardkernel eMMC modules perform very well with sustained read performance at around 300MB/s and write performance depending on module size at up to ~150 MB/s.
     
    With SSDs it depends -- we have an interface limitation of around ~395 MB/s on the USB3 ports and a little bit lower on the SATA ports but unless you buy rather expensive SSDs you won't be able to reach the board's bottleneck anyway. Please also keep in mind that the vast majority of consumer SSDs implements some sort of write caching and write performance drops down drastically once a certain amount of data is written (my Intel 540 get's then as slow as 60MB/s, IIRC the EVO750 can achieve ~150 MB/s and the EVO840 180 MB/s).
     
    Why aren't HDDs listed above? Since useless. Even Enterprise HDDs show random IO performance way too low. These things are good to store 'cold data' on it but never ever put your rootfs on them. They're outperformed by at least 5 times by any recent A1 rated SD card, even crappy SSDs are at least 10 times faster and Hardkernel's eMMC performs at least 50 times better.
     
    So how to interpret results above? If you want energy efficient and ok-ish performing storage for your rootfs (OS drive) then choose any of the currently available A1 rated SD cards from reputable vendors (choose more expensive ones for better performance/resilience, choose larger capacities than needed if you fear your flash memory wearing out too fast). If you want top performance at lowest consumption level choose Hardkernel's eMMC and keep in mind that the smallest module is somewhat write performance bottlenecked. Again: if you fear your flash memory wearing out too fast simply choose larger capacities than 'needed'.
     
    If you want to waste huge amounts of energy while still being outperformed by Hardkernel's eMMC buy a cheap SSD. Keep in mind that you need to disable PCIe powermanagement further increasing idle consumption to be able to outperform eMMC storage otherwise N1's SATA/PCIe implementation will bottleneck too much. So when do SSDs start to make sense? If you either really need higher performance than Hardkernel's eMMC modules and are willing to spend some serious amount of money for a good SSD or the '1k random IO' use case really applies to you (e.g. trying to run a database with insanely small record sizes that constantly updates at the storage layer).
     
    But always keep in mind: if you not really choose a more expensive and high performing SSD you'll always get lower performance than eMMC while consumption is at least 100 times higher. And always use SSDs at the SATA ports since only there you can get higher random IO performance compared to eMMC and being able to benefit from TRIM is essential (for details why TRIM is a problem on USB ports see above). But keep in mind that internal SATA ports are rated for 50 matings max so be prepared to destroy connectors easily when you permanently change cables on those SATA ports   
     
    But what if you feel that any SATA attached storage (the cheapest SSD around and even HDDs) must be an improvement compared to eMMC or SD cards? Just use it, all of the above is about facts and not feelings. You should only ensure to never ever test your storage performance since that might hurt your feelings (it would be as easy as 'cd $ssd-mountpoint ; iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2' but really don't do this if you want to believe in one of the most common misbeliefs with consumer electronics today)
     
    As a reference all IO benchmark results for SD cards, Hardkernel's eMMC modules and the SSD tests:
    https://pastebin.com/2wxPWcWr https://pastebin.com/ePUCXyg6 https://pastebin.com/N5wEghn3  
     
  8. Like
    tkaiser got a reaction from manuti in ODROID N1 -- not a review (yet)   
    Storage performance update... what to use to store the rootfs on?
     
    In the following I compare 4 good SD cards with 4 different eMMC modules Hardkernel sells for the N1 with 4 different SSD setups. As some background why I chose to measure random IO with 1k, 4k and 16k block sizes please read the 'SD card performance 2018 update' first.
     
    The following are IOPS numbers (IO operations per second) and important if we want to know how fast storage performs when used as an 'OS drive' (random IO performance is the most important factor here):
    1K w/r 4K w/r 16K w/r SanDisk Extreme Plus 16GB 566 2998 731 2738 557 2037 SanDisk Ultra A1 32GB 456 3171 843 2791 548 1777 SanDisk Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Orange eMMC 16GB 2450 7344 7093 7243 2968 5038 Orange eMMC 32GB 2568 7453 7365 7463 5682 5203 Orange eMMC 64GB 2489 7316 7950 6944 6059 5250 Orange eMMC 128GB 2498 8337 7064 7197 5459 4909 Intel 540 USB3 7076 4732 7053 4785 5342 3294 Samsung EVO750 USB3 8043 6245 7622 5421 6175 4481 Samsung EVO840 powersave 8167 5627 7605 5720 5973 4766 Samsung EVO840 performance 18742 10471 16156 9657 10390 7188 The SD cards I chose for this comparison all perform very well (an average no-name, Kingston, PNY, Verbatim or whatever other 'reputable' brand performs way lower wrt random IO!). But it can be clearly seen that Hardkernel's eMMC modules are a lot more performant. Regardless of size they all perform pretty similar though the small 16GB module being bottlenecked due to a write performance limitation that also affects 16k random IO write performance.
     
    With SSDs it depends: I chose somewhat ok-ish consumer SSDs for the test so in case you want to buy used SSDs or some 'great bargains' on Aliexpress or eBay be prepared that your numbers will look way worse. The SATA connected EVO840 is listed two times since performance with small blocksizes heavily depends on PCIe power management settings (default is powersave -- switching to performance increases idle consumption by around ~250mW but only then a SATA connected SSD is able to outperform Hardkernel's eMMC. That's important to know and also only applies to really performant SSDs. Cheap SSDs especially with small capacities perform way lower)
     
    Now let's look at sequential performance with large blocksizes (something that does NOT represent the 'OS drive' use case even remotely and is pretty irrelevant for almost all use cases except creation of stupid benchmark graphs):
    MB/s write MB/s read SanDisk Extreme Plus 16GB 63 67 SanDisk Ultra A1 32GB 20 66 SanDisk Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Orange eMMC 16GB 48 298 Orange eMMC 32GB 133 252 Orange eMMC 64GB 148 306 Orange eMMC 128GB 148 302 Intel 540 USB3 325 370 Samsung EVO750 USB3 400 395 Samsung EVO840 powersave 375 385 Samsung EVO840 performance 375 385 We can see that N1's SD card interface seems to bottleneck sequential read performance of all tested cards to around ~67 MB/s. Write performance depends mostly on the cards (all cheap cards like the tested SanDisk Ultra A1 32GB you get currently for $12 on Amazon are limited here). The Hardkernel eMMC modules perform very well with sustained read performance at around 300MB/s and write performance depending on module size at up to ~150 MB/s.
     
    With SSDs it depends -- we have an interface limitation of around ~395 MB/s on the USB3 ports and a little bit lower on the SATA ports but unless you buy rather expensive SSDs you won't be able to reach the board's bottleneck anyway. Please also keep in mind that the vast majority of consumer SSDs implements some sort of write caching and write performance drops down drastically once a certain amount of data is written (my Intel 540 get's then as slow as 60MB/s, IIRC the EVO750 can achieve ~150 MB/s and the EVO840 180 MB/s).
     
    Why aren't HDDs listed above? Since useless. Even Enterprise HDDs show random IO performance way too low. These things are good to store 'cold data' on it but never ever put your rootfs on them. They're outperformed by at least 5 times by any recent A1 rated SD card, even crappy SSDs are at least 10 times faster and Hardkernel's eMMC performs at least 50 times better.
     
    So how to interpret results above? If you want energy efficient and ok-ish performing storage for your rootfs (OS drive) then choose any of the currently available A1 rated SD cards from reputable vendors (choose more expensive ones for better performance/resilience, choose larger capacities than needed if you fear your flash memory wearing out too fast). If you want top performance at lowest consumption level choose Hardkernel's eMMC and keep in mind that the smallest module is somewhat write performance bottlenecked. Again: if you fear your flash memory wearing out too fast simply choose larger capacities than 'needed'.
     
    If you want to waste huge amounts of energy while still being outperformed by Hardkernel's eMMC buy a cheap SSD. Keep in mind that you need to disable PCIe powermanagement further increasing idle consumption to be able to outperform eMMC storage otherwise N1's SATA/PCIe implementation will bottleneck too much. So when do SSDs start to make sense? If you either really need higher performance than Hardkernel's eMMC modules and are willing to spend some serious amount of money for a good SSD or the '1k random IO' use case really applies to you (e.g. trying to run a database with insanely small record sizes that constantly updates at the storage layer).
     
    But always keep in mind: if you not really choose a more expensive and high performing SSD you'll always get lower performance than eMMC while consumption is at least 100 times higher. And always use SSDs at the SATA ports since only there you can get higher random IO performance compared to eMMC and being able to benefit from TRIM is essential (for details why TRIM is a problem on USB ports see above). But keep in mind that internal SATA ports are rated for 50 matings max so be prepared to destroy connectors easily when you permanently change cables on those SATA ports   
     
    But what if you feel that any SATA attached storage (the cheapest SSD around and even HDDs) must be an improvement compared to eMMC or SD cards? Just use it, all of the above is about facts and not feelings. You should only ensure to never ever test your storage performance since that might hurt your feelings (it would be as easy as 'cd $ssd-mountpoint ; iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2' but really don't do this if you want to believe in one of the most common misbeliefs with consumer electronics today)
     
    As a reference all IO benchmark results for SD cards, Hardkernel's eMMC modules and the SSD tests:
    https://pastebin.com/2wxPWcWr https://pastebin.com/ePUCXyg6 https://pastebin.com/N5wEghn3  
     
  9. Like
    tkaiser got a reaction from iav in SD card performance   
    2018 SD card update
     
    It's 2018 now, SD Association's A1 'performance class' spec is out over a year now and in the meantime we can buy products trying to be compliant to this performance class. SD cards carrying the A1 logo must be able to perform at least 1500 random read input-output operations per second (IOPS) with 4KB block size, 500 random write IOPS and 10 MB/s sustained sequential performance (see here for more details and background info)
     
    Why is this important? Since what we do on SBC at least for the rootfs is mostly random IO and not sequential IO as it's common in cameras or video recorders (that's the stuff SD cards have been invented to be used with in the beginning). As an SBC (or Android) user we're mostly interested in high random IO performance with smaller blocksizes since this is how 'real world' IO patterns mostly look like. Prior to A1 and A2 performance classes there was no way to know how SD cards perform in this area prior to buying. Fortunately this has changed now.
     
    Last week arrived an ODROID N1 dev sample so I bought two SanDisk A1 cards with 32GB capacity each. An el cheapo 'Ultra A1' for 13€ (~$15) and an 'Extreme A1' for 23€. I wanted to buy a slightly more expensive 'Extreme Plus A1' (since even more performance and especially reliability/longevity) but ordered the wrong one  Please keep in mind that the 'Extreme Plus' numbers shown below are made with an older card missing the A1 logo.
     
    Let's look how these things perform, this time on a new platform: RK3399 with an SD card interface that supports higher speed modes (requires kernel support and switching between 3.3V to 1.8V at the hardware layer). So results aren't comparable with the numbers we generated the last two years in this and other threads but that's not important any more... see at the bottom.
     
    A1 conformance requires at least 10 MB/s sequential performance and 500/1500 (write/read) IOPS with 4K blocksize. I tested also with 1K and 16K blocksizes for the simple reason to get an idea whether 4K results are useful to determine performance with smaller or larger blocksizes (since we already know that the vast majority of cheap SD cards out there shows a severe 16K random write performance drop which is the real reason so many people consider all SD cards being crap from a performance point of view).
     
    I tested with 7 cards, 4 of them SanDisk, two Samsung and the 'Crappy card' being a results mixture of a 4GB Kingston I started to test with and old results from a 4GB Intenso from two years ago (see first post of this thread). The Kingston died when testing with 4K blocksize and the performance of all these crappy 'noname class' cards doesn't vary that much:
    1K w/r 4K w/r 16K w/r Crappy card 4GB 32 1854 35 1595 2 603 Samsung EVO+ 128GB 141 1549 160 1471 579 1161 Ultra A1 32GB 456 3171 843 2791 548 1777 Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Extreme Plus 16GB 566 2998 731 2738 557 2037 Extreme Pro 8GB 304 2779 323 2754 221 1821 (All results in IOPS --> IO operations per second) For A1 compliance we only need to look at the middle column and have to expect at least 500/1500 IOPS minimum here. The 'Crappy card' fails as expected, the Samsung EVO+ too (but we already knew that for whatever reasons newer EVO+ or those with larger capacity perform worse than the 32GB and 64GB variants we tested two years ago), the Samsung Pro shows the best performance here while one of the 4 SanDisk also fails. But my Extreme Pro 8GB is now 3 years old, the other one I had showed signs of data corruption few months ago and when testing 2 years ago (see 1st post in this thread) random write performance was at 800. So most probably this card is about to die soon and the numbers above are partially irrelevant..
     
    What about sequential performance? Well, 'Crappy card' also not able to meet specs and all the better cards being 'bottlenecked' by ODROID N1 (some of these cards show 80 MB/s in my MacBook's card reader but Hardkernel chose to use some safety headroom for good reasons and limits the maximum speed for improved reliability)
    MB/s write MB/s read Crappy card 4GB 9 15 Samsung EVO+ 128GB 21 65 Ultra A1 32GB 20 66 Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Extreme Plus 16GB 63 67 Extreme Pro 8GB 50 67 Well, sequential transfer speeds are close to irrelevant with single board computers or Android but it's good to know that boards that allow for higher SD card speed modes (e.g. almost all ODROIDs and the Tinkerboard) also show an improvement in random IO performance if the card is a good one. The ODROID N1 was limited to DDR50 (slowest SD card mode) until today when Hardkernel unlocked UHS capabilities so that my cards (except of 'Crappy card') could all use SDR104 mode. With DDR50 mode sequential performance is limited to 22.5/23.5MB/s (write/read) but more interestingly random IO performance also differs. See IOPS results with the two SanDisk A1 cards, one time limited to DDR50 and then with SDR104:
    1K w/r 4K w/r 16K w/r Ultra A1 DDR50 449 2966 678 2191 445 985 Ultra A1 SDR104 456 3171 843 2791 548 1777 1K w/r 4K w/r 16K w/r Extreme A1 DDR50 740 3049 1039 2408 747 1068 Extreme A1 SDR104 833 3289 1507 3281 1126 2113 We can clearly see that the larger the blocksize the more the interface speed influences also random IO performance (look especially at 16K random reads that double with SDR104)
     
    Some conclusions:
    When comparing results above the somewhat older Samsung Pro performs pretty similar to the Extreme A1. But great random IO performance is only guaranteed with cards carrying the A1 logo (or A2 soon) so it might happen to you that buying another Samsung Pro today results in way lower random IO performance (see Hardkernel's results with a Samsung Pro Plus showing 224/3023 4k IOPS which is way below the 1124/3898 my old Pro achieves with especially write performance 5 times worse and below A1 criteria) We still need to focus on the correct performance metrics. Sequential performance is more or less irrelevant ('Class 10', 'UHS' and so on), all that matters is random IO (A1 and A2 soon). Please keep in mind that you can buy a nice looking UHS card from 'reputable' brands like Kingston, Verbatim, PNY and the like that might achieve theoretical 80MB/s or even 100MB/s sequential performance (you're not able to benefit from anyway since your board's SD card interface will be the bottleneck) but simply sucks with random IO performance. We're talking about up to 500 times worse performance when trusting in 'renowned' brands and ignoring performance reality (see 16k random writes comparing 'Crappy card' and 'Extreme A1') Only a few vendors on this planet run NAND flash memory fabs, only a few companies produce flash memory controllers and have the necessary know-how in house. And only a few combine their own NAND flash with their own controllers to their own retail products. That's the simple reason why at least I only buy SD cards from these 4 brands: Samsung, SanDisk, Toshiba, Transcend The A1 performance speed class is a great and necessary improvement since now we can rely on getting covenant random IO performance. This also helps in fighting counterfeit flash memory products since even if fraudsters in the meantime produce fake SD cards that look real and show same capacity usually these fakes suck at random IO performance. So after testing new cards with either F3 or H2testw it's now another iozone or CrystalDiskMark test to check for overall performance including random IO (!) and if performance sucks you simply return the cards asking for a refund. TL;DR: If you buy new SD cards choose those carrying an A1 or A2 logo. Buy only good brands (their names start with either S or T). Don't trust in getting genuine products but always expect counterfeit stuff. That's why you should only buy at sellers with a 'no questions asked' return/refund policy and why you have to immediately check your cards directly after purchase. If you also care about reliability/resilience buy more expensive cards (e.g. the twice as expensive Extreme Plus A1 instead of Ultra A1) and choose larger capacities than needed.
     
    Finally: All detailed SD card test results can be found here: https://pastebin.com/2wxPWcWr As a comparison performance numbers made with same ODROID N1, same settings but vendor's orange eMMC modules based on Samsung eMMC and varying only in size: https://pastebin.com/ePUCXyg6
  10. Like
    tkaiser got a reaction from lomady in SD card performance   
    2018 SD card update
     
    It's 2018 now, SD Association's A1 'performance class' spec is out over a year now and in the meantime we can buy products trying to be compliant to this performance class. SD cards carrying the A1 logo must be able to perform at least 1500 random read input-output operations per second (IOPS) with 4KB block size, 500 random write IOPS and 10 MB/s sustained sequential performance (see here for more details and background info)
     
    Why is this important? Since what we do on SBC at least for the rootfs is mostly random IO and not sequential IO as it's common in cameras or video recorders (that's the stuff SD cards have been invented to be used with in the beginning). As an SBC (or Android) user we're mostly interested in high random IO performance with smaller blocksizes since this is how 'real world' IO patterns mostly look like. Prior to A1 and A2 performance classes there was no way to know how SD cards perform in this area prior to buying. Fortunately this has changed now.
     
    Last week arrived an ODROID N1 dev sample so I bought two SanDisk A1 cards with 32GB capacity each. An el cheapo 'Ultra A1' for 13€ (~$15) and an 'Extreme A1' for 23€. I wanted to buy a slightly more expensive 'Extreme Plus A1' (since even more performance and especially reliability/longevity) but ordered the wrong one  Please keep in mind that the 'Extreme Plus' numbers shown below are made with an older card missing the A1 logo.
     
    Let's look how these things perform, this time on a new platform: RK3399 with an SD card interface that supports higher speed modes (requires kernel support and switching between 3.3V to 1.8V at the hardware layer). So results aren't comparable with the numbers we generated the last two years in this and other threads but that's not important any more... see at the bottom.
     
    A1 conformance requires at least 10 MB/s sequential performance and 500/1500 (write/read) IOPS with 4K blocksize. I tested also with 1K and 16K blocksizes for the simple reason to get an idea whether 4K results are useful to determine performance with smaller or larger blocksizes (since we already know that the vast majority of cheap SD cards out there shows a severe 16K random write performance drop which is the real reason so many people consider all SD cards being crap from a performance point of view).
     
    I tested with 7 cards, 4 of them SanDisk, two Samsung and the 'Crappy card' being a results mixture of a 4GB Kingston I started to test with and old results from a 4GB Intenso from two years ago (see first post of this thread). The Kingston died when testing with 4K blocksize and the performance of all these crappy 'noname class' cards doesn't vary that much:
    1K w/r 4K w/r 16K w/r Crappy card 4GB 32 1854 35 1595 2 603 Samsung EVO+ 128GB 141 1549 160 1471 579 1161 Ultra A1 32GB 456 3171 843 2791 548 1777 Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Extreme Plus 16GB 566 2998 731 2738 557 2037 Extreme Pro 8GB 304 2779 323 2754 221 1821 (All results in IOPS --> IO operations per second) For A1 compliance we only need to look at the middle column and have to expect at least 500/1500 IOPS minimum here. The 'Crappy card' fails as expected, the Samsung EVO+ too (but we already knew that for whatever reasons newer EVO+ or those with larger capacity perform worse than the 32GB and 64GB variants we tested two years ago), the Samsung Pro shows the best performance here while one of the 4 SanDisk also fails. But my Extreme Pro 8GB is now 3 years old, the other one I had showed signs of data corruption few months ago and when testing 2 years ago (see 1st post in this thread) random write performance was at 800. So most probably this card is about to die soon and the numbers above are partially irrelevant..
     
    What about sequential performance? Well, 'Crappy card' also not able to meet specs and all the better cards being 'bottlenecked' by ODROID N1 (some of these cards show 80 MB/s in my MacBook's card reader but Hardkernel chose to use some safety headroom for good reasons and limits the maximum speed for improved reliability)
    MB/s write MB/s read Crappy card 4GB 9 15 Samsung EVO+ 128GB 21 65 Ultra A1 32GB 20 66 Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Extreme Plus 16GB 63 67 Extreme Pro 8GB 50 67 Well, sequential transfer speeds are close to irrelevant with single board computers or Android but it's good to know that boards that allow for higher SD card speed modes (e.g. almost all ODROIDs and the Tinkerboard) also show an improvement in random IO performance if the card is a good one. The ODROID N1 was limited to DDR50 (slowest SD card mode) until today when Hardkernel unlocked UHS capabilities so that my cards (except of 'Crappy card') could all use SDR104 mode. With DDR50 mode sequential performance is limited to 22.5/23.5MB/s (write/read) but more interestingly random IO performance also differs. See IOPS results with the two SanDisk A1 cards, one time limited to DDR50 and then with SDR104:
    1K w/r 4K w/r 16K w/r Ultra A1 DDR50 449 2966 678 2191 445 985 Ultra A1 SDR104 456 3171 843 2791 548 1777 1K w/r 4K w/r 16K w/r Extreme A1 DDR50 740 3049 1039 2408 747 1068 Extreme A1 SDR104 833 3289 1507 3281 1126 2113 We can clearly see that the larger the blocksize the more the interface speed influences also random IO performance (look especially at 16K random reads that double with SDR104)
     
    Some conclusions:
    When comparing results above the somewhat older Samsung Pro performs pretty similar to the Extreme A1. But great random IO performance is only guaranteed with cards carrying the A1 logo (or A2 soon) so it might happen to you that buying another Samsung Pro today results in way lower random IO performance (see Hardkernel's results with a Samsung Pro Plus showing 224/3023 4k IOPS which is way below the 1124/3898 my old Pro achieves with especially write performance 5 times worse and below A1 criteria) We still need to focus on the correct performance metrics. Sequential performance is more or less irrelevant ('Class 10', 'UHS' and so on), all that matters is random IO (A1 and A2 soon). Please keep in mind that you can buy a nice looking UHS card from 'reputable' brands like Kingston, Verbatim, PNY and the like that might achieve theoretical 80MB/s or even 100MB/s sequential performance (you're not able to benefit from anyway since your board's SD card interface will be the bottleneck) but simply sucks with random IO performance. We're talking about up to 500 times worse performance when trusting in 'renowned' brands and ignoring performance reality (see 16k random writes comparing 'Crappy card' and 'Extreme A1') Only a few vendors on this planet run NAND flash memory fabs, only a few companies produce flash memory controllers and have the necessary know-how in house. And only a few combine their own NAND flash with their own controllers to their own retail products. That's the simple reason why at least I only buy SD cards from these 4 brands: Samsung, SanDisk, Toshiba, Transcend The A1 performance speed class is a great and necessary improvement since now we can rely on getting covenant random IO performance. This also helps in fighting counterfeit flash memory products since even if fraudsters in the meantime produce fake SD cards that look real and show same capacity usually these fakes suck at random IO performance. So after testing new cards with either F3 or H2testw it's now another iozone or CrystalDiskMark test to check for overall performance including random IO (!) and if performance sucks you simply return the cards asking for a refund. TL;DR: If you buy new SD cards choose those carrying an A1 or A2 logo. Buy only good brands (their names start with either S or T). Don't trust in getting genuine products but always expect counterfeit stuff. That's why you should only buy at sellers with a 'no questions asked' return/refund policy and why you have to immediately check your cards directly after purchase. If you also care about reliability/resilience buy more expensive cards (e.g. the twice as expensive Extreme Plus A1 instead of Ultra A1) and choose larger capacities than needed.
     
    Finally: All detailed SD card test results can be found here: https://pastebin.com/2wxPWcWr As a comparison performance numbers made with same ODROID N1, same settings but vendor's orange eMMC modules based on Samsung eMMC and varying only in size: https://pastebin.com/ePUCXyg6
  11. Like
    tkaiser got a reaction from vlad59 in ODROID N1 -- not a review (yet)   
    UPDATE: You'll find a preliminary performance overview at the end of the thread. Click here.
     
     
    This is NOT an ODROID N1 review since it's way too early for this and the following will focus on just a very small amount of use cases the board might be used for: server stuff and everything that focuses on network, IO and internal limitations. If you want the hype instead better join Hardkernel's vendor community over there: https://forum.odroid.com/viewforum.php?f=148
     
    All numbers you find below are PRELIMINARY since it's way too early to benchmark this board. This is just the try to get some baseline numbers to better understand for which use cases the device might be appropriate, where to look further into and which settings might need improvements.
     
    Background info first
     
    ODROID N1 is based on the Rockchip RK3399 SoC so we know already a lot since RK3399 isn't really new (see Chromebooks, countless TV boxes with this chip and dev boards like Firefly RK3399, ROCK960 and a lot of others... and there will be a lot more devices coming in 2018 like another board from China soon with a M.2 key M slot exposing all PCIe lanes).
     
    What we already know is that the SoC is one of Rockchip's 'open source SoCs' so software support is already pretty good and the chip vendor itself actively upstreams software support. We also know RK3399 is not the greatest choice for compiling code (use case bottlenecked by memory bandwidth and only 2 fast cores combined with 4 slow ones, for this use case 4 x A15 or A17 cores perform much better), that ARMv8 crypto extensions are supported (see few posts below), that the SoC performs nicely with Android and 'Desktop Linux' stuff (think about GPU and VPU acceleration). We also know that this SoC has 2 USB3 ports and implements PCIe 2.1 with a four lane interface. But so far we don't know how the internal bottlenecks look like so let's focus on this now.
     
    The PCIe 2.1 x4 interface is said to support both Gen1 and Gen2 link speeds (2.5 vs. 5GT/s) but there was recently a change in RK3399 datasheet (downgrade from Gen2 to Gen1) and some mainline kernel patch descriptions seem to indicate that RK3399 is not always able to train for Gen2 link speeds. On ODROID N1 there's a x1 PCIe link used configured as either Gen1 or Gen2 to which a dual-port SATA adapter is connected. The Asmedia ASM1061 was the obvious choice since while being a somewhat old design (AFAIK from 2010) it's cheap and 'fast enough' at least when combined with one or even two HDD.
     
    Since the PCIe implementation on this early N1 dev samples is fixed and limited we need to choose other RK3399 devices to get a clue about PCIe limitations (RockPro64, ROCK960 or the yet not announced other board from China). So let's focus on SATA and USB3 instead. While SATA on 'development boards' isn't nothing new, it's often done with (sometimes really crappy) USB2 SATA bridges, recently sometimes with good USB3 SATA bridges (see ODROID HC1/HC2, Cloudmedia Transformer or Swiftboard) and sometimes it's even 'true' SATA:
     
    Allwinner A10/A20/R40/V40 (many SBC) AM572x Sitara (eg. BeagleBoard-X15 with 1 x eSATA and 1 x SATA on Expansion header) Marvell Armada 38x (Clearfog Base, Clearfog Pro, Helios4) Marvell Armada 37x0 (EspressoBin) NXP i.MX6 (Cubox-i,  the various Hummingboard, versions, same with Wandboard and so on)  
    All the above SoC families do 'native SATA' (the SoC itself implements SATA protocols and connectivity) but performance differs a lot with 'Allwinner SATA' being the worst and only the Marvell implementations performing as expected (+500 MB/s sequential and also very high random IO performance which is what you're looking after when using SSDs). As Armbian user you already know: this stuff is documented in detail, just read through this and that.
     
    RK3399 is not SATA capable and we're talking here about PCIe attached SATA which has 2 disadvantages: slightly bottlenecking performance while increasing overall consumption. N1's SATA implementation and how it's 'advertised' (rootfs on SATA) pose another challenge but this is something for a later post (the sh*tshow known from 'SD cards' the last years now arriving at a different product category called 'SSD').
     
    Benchmarking storage performance is challenging and most 'reviews' done on SBCs use inappropriate tools (see this nice bonnie/bonnie++ example), inappropriate settings (see all those dd and hdparm numbers testing partially filesystems buffers and caches and not storage) or focus only on irrelevant stuff (eg. sequential performance in 'worst case testing mode' only looking at one direction).
     

     
    Some USB3 tests first
     
    All SSDs I use for the test are powered externally and not by N1 since I ran more than one time in situations with board powered SSDs that performance dropped a lot when some sorts of underpowering occured. The 2 USB3 enclosures above are powered by a separate 5V rail and the SATA attached SSDs by the dual-voltage PSU behind. As expected USB3 storage can use the much faster UAS protocol (we know this from RK3328 devices like ROCK64 already which uses same XHCI controller and most probably nearly identical kernel) and also performance numbers match (with large block and file sizes we get close to 400 MB/s).
     
    We chose iozone for the simple reason to be able to compare with previous numbers but a more thorough benchmark would need some fio testing with different test sets. But it's only about getting a baseline now. Tests done with Hardkernel's Debian Stretch image with some tweaks applied. The image relies on Rockchip's 4.4 BSP kernel (4.4.112) with some Hardkernel tweaks and I adjusted the following: First set both cpufreq governors to performance to be not affected by potentially wrong/weird cpufreq scaling behaviour. Then do static IRQ distribution for USB3 and PCIe on cpu1, cpu2 and cpu3 (all little cores but while checking CPU utilization none of the cores was fully saturated so A53@1.5GHz is fine):
    echo 2 >/proc/irq/226/smp_affinity echo 4 >/proc/irq/227/smp_affinity echo 8 >/proc/irq/228/smp_affinity To avoid CPU core collissions the benchmark task itself has been sent to one of the two A72 cores:
    taskset -c 5 iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Unfortunately currently I've only crappy SSDs lying around (all cheap consumer SSDs: Samsung EVO 840 and 750, a Samsung PM851 and a Intel 540). So we need to take the results with a grain of salt since those SSDs suck especially with continuous write tests (sequential write performance drops down a lot after a short period of time).
     
    First test is to determine whether USB3 ports behave differently (AFAIK one of the two could also be configured as an OTG port and with some SBC I've seen serious performance drops in such a mode). But nope, they perform identical:
    EVO840 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 EVO840 behind JMS567 (UAS active) on upper USB3 port (xhci-hcd:usb5, IRQ 227): random random kB reclen write rewrite read reread read write 102400 1 6195 6545 7383 7383 4816 6518 102400 4 23191 25114 34370 34716 23580 25199 102400 16 78727 86695 104957 106634 76359 87610 102400 512 307469 315243 293077 302678 293442 321779 102400 1024 335772 336833 326940 339128 330298 350271 102400 16384 366465 376863 371193 384503 383297 379898 Now attaching an EVO750 (not that fast) that performs pretty identical behind the XHCI host controller and the JMS567 controller inside the enclosure:
    EVO750 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 (so USB3 is the bottleneck here, especially with random IO an EVO840 is much much faster than an EVO750 but here they perform identical due to the massive USB protocol overhead)
     
    Let's try both USB3 ports at the same time

    First quick try was a BTRFS RAID-0 made with 'mkfs.btrfs -f -m raid0 -d raid0 /dev/sda1 /dev/sdb1'. Please note that BTRFS is not the best choice here since all (over)writes with blocksizes lower than btrfs' internal blocksize (4K default) are way slower compared to non CoW filesystems:
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       1     2659     1680   189424   621860   435196     1663           102400       4    21943    18762    24206    24034    18107    17505           102400      16    41983    46379    62235    60665    52517    42925           102400     512   180106   170002   143494   149187   138185   180238           102400    1024   170757   185623   159296   156870   156869   179560           102400   16384   231366   247201   340649   351774   353245   231721 That's BS numbers, let's forget about them. Now trying the same with mdraid/ext4 configuring a RAID 0 and putting an ext4 on it and... N1 simply powered down when executing mkfs.ext4. Adding 'coherent_pool=2M' to bootargs seems to do the job (and I created the mdraid0 in between with both SSDs connected through SATA)
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       4    25133    29444    38340    38490    23403    27947           102400      16    85036    97638   113992   114834    79505    95274           102400     512   306492   314124   295266   305411   289393   322493           102400    1024   344588   343012   322018   332545   316320   357040           102400   16384   384689   392707   371415   384741   388054   388908 Seems we're talking here already about one real bottleneck? We see nice improvements with small blocksizes which is an indication that RAID0 is doing its job. But with larger blocksizes we're not able to exceed the 400MB/s barrier so it seems both USB3 ports have to share bandwidth (comparable to the situation on ODROID XU4 where the two USB3 receptacles are connected to an internal USB3 hub which is connected to one USB3 port of the Exynos SoC)
     
    Edit: @Xalius used these results to look into RK3399 TRM (technical reference manual). Quoting ROCK64 IRC:
  12. Like
    tkaiser got a reaction from chwe in ODROID N1 -- not a review (yet)   
    UPDATE: You'll find a preliminary performance overview at the end of the thread. Click here.
     
     
    This is NOT an ODROID N1 review since it's way too early for this and the following will focus on just a very small amount of use cases the board might be used for: server stuff and everything that focuses on network, IO and internal limitations. If you want the hype instead better join Hardkernel's vendor community over there: https://forum.odroid.com/viewforum.php?f=148
     
    All numbers you find below are PRELIMINARY since it's way too early to benchmark this board. This is just the try to get some baseline numbers to better understand for which use cases the device might be appropriate, where to look further into and which settings might need improvements.
     
    Background info first
     
    ODROID N1 is based on the Rockchip RK3399 SoC so we know already a lot since RK3399 isn't really new (see Chromebooks, countless TV boxes with this chip and dev boards like Firefly RK3399, ROCK960 and a lot of others... and there will be a lot more devices coming in 2018 like another board from China soon with a M.2 key M slot exposing all PCIe lanes).
     
    What we already know is that the SoC is one of Rockchip's 'open source SoCs' so software support is already pretty good and the chip vendor itself actively upstreams software support. We also know RK3399 is not the greatest choice for compiling code (use case bottlenecked by memory bandwidth and only 2 fast cores combined with 4 slow ones, for this use case 4 x A15 or A17 cores perform much better), that ARMv8 crypto extensions are supported (see few posts below), that the SoC performs nicely with Android and 'Desktop Linux' stuff (think about GPU and VPU acceleration). We also know that this SoC has 2 USB3 ports and implements PCIe 2.1 with a four lane interface. But so far we don't know how the internal bottlenecks look like so let's focus on this now.
     
    The PCIe 2.1 x4 interface is said to support both Gen1 and Gen2 link speeds (2.5 vs. 5GT/s) but there was recently a change in RK3399 datasheet (downgrade from Gen2 to Gen1) and some mainline kernel patch descriptions seem to indicate that RK3399 is not always able to train for Gen2 link speeds. On ODROID N1 there's a x1 PCIe link used configured as either Gen1 or Gen2 to which a dual-port SATA adapter is connected. The Asmedia ASM1061 was the obvious choice since while being a somewhat old design (AFAIK from 2010) it's cheap and 'fast enough' at least when combined with one or even two HDD.
     
    Since the PCIe implementation on this early N1 dev samples is fixed and limited we need to choose other RK3399 devices to get a clue about PCIe limitations (RockPro64, ROCK960 or the yet not announced other board from China). So let's focus on SATA and USB3 instead. While SATA on 'development boards' isn't nothing new, it's often done with (sometimes really crappy) USB2 SATA bridges, recently sometimes with good USB3 SATA bridges (see ODROID HC1/HC2, Cloudmedia Transformer or Swiftboard) and sometimes it's even 'true' SATA:
     
    Allwinner A10/A20/R40/V40 (many SBC) AM572x Sitara (eg. BeagleBoard-X15 with 1 x eSATA and 1 x SATA on Expansion header) Marvell Armada 38x (Clearfog Base, Clearfog Pro, Helios4) Marvell Armada 37x0 (EspressoBin) NXP i.MX6 (Cubox-i,  the various Hummingboard, versions, same with Wandboard and so on)  
    All the above SoC families do 'native SATA' (the SoC itself implements SATA protocols and connectivity) but performance differs a lot with 'Allwinner SATA' being the worst and only the Marvell implementations performing as expected (+500 MB/s sequential and also very high random IO performance which is what you're looking after when using SSDs). As Armbian user you already know: this stuff is documented in detail, just read through this and that.
     
    RK3399 is not SATA capable and we're talking here about PCIe attached SATA which has 2 disadvantages: slightly bottlenecking performance while increasing overall consumption. N1's SATA implementation and how it's 'advertised' (rootfs on SATA) pose another challenge but this is something for a later post (the sh*tshow known from 'SD cards' the last years now arriving at a different product category called 'SSD').
     
    Benchmarking storage performance is challenging and most 'reviews' done on SBCs use inappropriate tools (see this nice bonnie/bonnie++ example), inappropriate settings (see all those dd and hdparm numbers testing partially filesystems buffers and caches and not storage) or focus only on irrelevant stuff (eg. sequential performance in 'worst case testing mode' only looking at one direction).
     

     
    Some USB3 tests first
     
    All SSDs I use for the test are powered externally and not by N1 since I ran more than one time in situations with board powered SSDs that performance dropped a lot when some sorts of underpowering occured. The 2 USB3 enclosures above are powered by a separate 5V rail and the SATA attached SSDs by the dual-voltage PSU behind. As expected USB3 storage can use the much faster UAS protocol (we know this from RK3328 devices like ROCK64 already which uses same XHCI controller and most probably nearly identical kernel) and also performance numbers match (with large block and file sizes we get close to 400 MB/s).
     
    We chose iozone for the simple reason to be able to compare with previous numbers but a more thorough benchmark would need some fio testing with different test sets. But it's only about getting a baseline now. Tests done with Hardkernel's Debian Stretch image with some tweaks applied. The image relies on Rockchip's 4.4 BSP kernel (4.4.112) with some Hardkernel tweaks and I adjusted the following: First set both cpufreq governors to performance to be not affected by potentially wrong/weird cpufreq scaling behaviour. Then do static IRQ distribution for USB3 and PCIe on cpu1, cpu2 and cpu3 (all little cores but while checking CPU utilization none of the cores was fully saturated so A53@1.5GHz is fine):
    echo 2 >/proc/irq/226/smp_affinity echo 4 >/proc/irq/227/smp_affinity echo 8 >/proc/irq/228/smp_affinity To avoid CPU core collissions the benchmark task itself has been sent to one of the two A72 cores:
    taskset -c 5 iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Unfortunately currently I've only crappy SSDs lying around (all cheap consumer SSDs: Samsung EVO 840 and 750, a Samsung PM851 and a Intel 540). So we need to take the results with a grain of salt since those SSDs suck especially with continuous write tests (sequential write performance drops down a lot after a short period of time).
     
    First test is to determine whether USB3 ports behave differently (AFAIK one of the two could also be configured as an OTG port and with some SBC I've seen serious performance drops in such a mode). But nope, they perform identical:
    EVO840 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 EVO840 behind JMS567 (UAS active) on upper USB3 port (xhci-hcd:usb5, IRQ 227): random random kB reclen write rewrite read reread read write 102400 1 6195 6545 7383 7383 4816 6518 102400 4 23191 25114 34370 34716 23580 25199 102400 16 78727 86695 104957 106634 76359 87610 102400 512 307469 315243 293077 302678 293442 321779 102400 1024 335772 336833 326940 339128 330298 350271 102400 16384 366465 376863 371193 384503 383297 379898 Now attaching an EVO750 (not that fast) that performs pretty identical behind the XHCI host controller and the JMS567 controller inside the enclosure:
    EVO750 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 (so USB3 is the bottleneck here, especially with random IO an EVO840 is much much faster than an EVO750 but here they perform identical due to the massive USB protocol overhead)
     
    Let's try both USB3 ports at the same time

    First quick try was a BTRFS RAID-0 made with 'mkfs.btrfs -f -m raid0 -d raid0 /dev/sda1 /dev/sdb1'. Please note that BTRFS is not the best choice here since all (over)writes with blocksizes lower than btrfs' internal blocksize (4K default) are way slower compared to non CoW filesystems:
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       1     2659     1680   189424   621860   435196     1663           102400       4    21943    18762    24206    24034    18107    17505           102400      16    41983    46379    62235    60665    52517    42925           102400     512   180106   170002   143494   149187   138185   180238           102400    1024   170757   185623   159296   156870   156869   179560           102400   16384   231366   247201   340649   351774   353245   231721 That's BS numbers, let's forget about them. Now trying the same with mdraid/ext4 configuring a RAID 0 and putting an ext4 on it and... N1 simply powered down when executing mkfs.ext4. Adding 'coherent_pool=2M' to bootargs seems to do the job (and I created the mdraid0 in between with both SSDs connected through SATA)
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       4    25133    29444    38340    38490    23403    27947           102400      16    85036    97638   113992   114834    79505    95274           102400     512   306492   314124   295266   305411   289393   322493           102400    1024   344588   343012   322018   332545   316320   357040           102400   16384   384689   392707   371415   384741   388054   388908 Seems we're talking here already about one real bottleneck? We see nice improvements with small blocksizes which is an indication that RAID0 is doing its job. But with larger blocksizes we're not able to exceed the 400MB/s barrier so it seems both USB3 ports have to share bandwidth (comparable to the situation on ODROID XU4 where the two USB3 receptacles are connected to an internal USB3 hub which is connected to one USB3 port of the Exynos SoC)
     
    Edit: @Xalius used these results to look into RK3399 TRM (technical reference manual). Quoting ROCK64 IRC:
  13. Like
    tkaiser got a reaction from TonyMac32 in ODROID N1 -- not a review (yet)   
    UPDATE: You'll find a preliminary performance overview at the end of the thread. Click here.
     
     
    This is NOT an ODROID N1 review since it's way too early for this and the following will focus on just a very small amount of use cases the board might be used for: server stuff and everything that focuses on network, IO and internal limitations. If you want the hype instead better join Hardkernel's vendor community over there: https://forum.odroid.com/viewforum.php?f=148
     
    All numbers you find below are PRELIMINARY since it's way too early to benchmark this board. This is just the try to get some baseline numbers to better understand for which use cases the device might be appropriate, where to look further into and which settings might need improvements.
     
    Background info first
     
    ODROID N1 is based on the Rockchip RK3399 SoC so we know already a lot since RK3399 isn't really new (see Chromebooks, countless TV boxes with this chip and dev boards like Firefly RK3399, ROCK960 and a lot of others... and there will be a lot more devices coming in 2018 like another board from China soon with a M.2 key M slot exposing all PCIe lanes).
     
    What we already know is that the SoC is one of Rockchip's 'open source SoCs' so software support is already pretty good and the chip vendor itself actively upstreams software support. We also know RK3399 is not the greatest choice for compiling code (use case bottlenecked by memory bandwidth and only 2 fast cores combined with 4 slow ones, for this use case 4 x A15 or A17 cores perform much better), that ARMv8 crypto extensions are supported (see few posts below), that the SoC performs nicely with Android and 'Desktop Linux' stuff (think about GPU and VPU acceleration). We also know that this SoC has 2 USB3 ports and implements PCIe 2.1 with a four lane interface. But so far we don't know how the internal bottlenecks look like so let's focus on this now.
     
    The PCIe 2.1 x4 interface is said to support both Gen1 and Gen2 link speeds (2.5 vs. 5GT/s) but there was recently a change in RK3399 datasheet (downgrade from Gen2 to Gen1) and some mainline kernel patch descriptions seem to indicate that RK3399 is not always able to train for Gen2 link speeds. On ODROID N1 there's a x1 PCIe link used configured as either Gen1 or Gen2 to which a dual-port SATA adapter is connected. The Asmedia ASM1061 was the obvious choice since while being a somewhat old design (AFAIK from 2010) it's cheap and 'fast enough' at least when combined with one or even two HDD.
     
    Since the PCIe implementation on this early N1 dev samples is fixed and limited we need to choose other RK3399 devices to get a clue about PCIe limitations (RockPro64, ROCK960 or the yet not announced other board from China). So let's focus on SATA and USB3 instead. While SATA on 'development boards' isn't nothing new, it's often done with (sometimes really crappy) USB2 SATA bridges, recently sometimes with good USB3 SATA bridges (see ODROID HC1/HC2, Cloudmedia Transformer or Swiftboard) and sometimes it's even 'true' SATA:
     
    Allwinner A10/A20/R40/V40 (many SBC) AM572x Sitara (eg. BeagleBoard-X15 with 1 x eSATA and 1 x SATA on Expansion header) Marvell Armada 38x (Clearfog Base, Clearfog Pro, Helios4) Marvell Armada 37x0 (EspressoBin) NXP i.MX6 (Cubox-i,  the various Hummingboard, versions, same with Wandboard and so on)  
    All the above SoC families do 'native SATA' (the SoC itself implements SATA protocols and connectivity) but performance differs a lot with 'Allwinner SATA' being the worst and only the Marvell implementations performing as expected (+500 MB/s sequential and also very high random IO performance which is what you're looking after when using SSDs). As Armbian user you already know: this stuff is documented in detail, just read through this and that.
     
    RK3399 is not SATA capable and we're talking here about PCIe attached SATA which has 2 disadvantages: slightly bottlenecking performance while increasing overall consumption. N1's SATA implementation and how it's 'advertised' (rootfs on SATA) pose another challenge but this is something for a later post (the sh*tshow known from 'SD cards' the last years now arriving at a different product category called 'SSD').
     
    Benchmarking storage performance is challenging and most 'reviews' done on SBCs use inappropriate tools (see this nice bonnie/bonnie++ example), inappropriate settings (see all those dd and hdparm numbers testing partially filesystems buffers and caches and not storage) or focus only on irrelevant stuff (eg. sequential performance in 'worst case testing mode' only looking at one direction).
     

     
    Some USB3 tests first
     
    All SSDs I use for the test are powered externally and not by N1 since I ran more than one time in situations with board powered SSDs that performance dropped a lot when some sorts of underpowering occured. The 2 USB3 enclosures above are powered by a separate 5V rail and the SATA attached SSDs by the dual-voltage PSU behind. As expected USB3 storage can use the much faster UAS protocol (we know this from RK3328 devices like ROCK64 already which uses same XHCI controller and most probably nearly identical kernel) and also performance numbers match (with large block and file sizes we get close to 400 MB/s).
     
    We chose iozone for the simple reason to be able to compare with previous numbers but a more thorough benchmark would need some fio testing with different test sets. But it's only about getting a baseline now. Tests done with Hardkernel's Debian Stretch image with some tweaks applied. The image relies on Rockchip's 4.4 BSP kernel (4.4.112) with some Hardkernel tweaks and I adjusted the following: First set both cpufreq governors to performance to be not affected by potentially wrong/weird cpufreq scaling behaviour. Then do static IRQ distribution for USB3 and PCIe on cpu1, cpu2 and cpu3 (all little cores but while checking CPU utilization none of the cores was fully saturated so A53@1.5GHz is fine):
    echo 2 >/proc/irq/226/smp_affinity echo 4 >/proc/irq/227/smp_affinity echo 8 >/proc/irq/228/smp_affinity To avoid CPU core collissions the benchmark task itself has been sent to one of the two A72 cores:
    taskset -c 5 iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Unfortunately currently I've only crappy SSDs lying around (all cheap consumer SSDs: Samsung EVO 840 and 750, a Samsung PM851 and a Intel 540). So we need to take the results with a grain of salt since those SSDs suck especially with continuous write tests (sequential write performance drops down a lot after a short period of time).
     
    First test is to determine whether USB3 ports behave differently (AFAIK one of the two could also be configured as an OTG port and with some SBC I've seen serious performance drops in such a mode). But nope, they perform identical:
    EVO840 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 EVO840 behind JMS567 (UAS active) on upper USB3 port (xhci-hcd:usb5, IRQ 227): random random kB reclen write rewrite read reread read write 102400 1 6195 6545 7383 7383 4816 6518 102400 4 23191 25114 34370 34716 23580 25199 102400 16 78727 86695 104957 106634 76359 87610 102400 512 307469 315243 293077 302678 293442 321779 102400 1024 335772 336833 326940 339128 330298 350271 102400 16384 366465 376863 371193 384503 383297 379898 Now attaching an EVO750 (not that fast) that performs pretty identical behind the XHCI host controller and the JMS567 controller inside the enclosure:
    EVO750 behind JMS567 (UAS active) on lower USB3 port (xhci-hcd:usb7, IRQ 228): random random kB reclen write rewrite read reread read write 102400 1 6200 6569 7523 7512 4897 6584 102400 4 23065 25349 34612 34813 23978 25231 102400 16 78836 87689 105249 106777 78658 88240 102400 512 302757 314163 292206 300964 292599 321848 102400 1024 338803 346394 327101 339218 329792 351382 102400 16384 357991 376834 371308 384247 383501 377039 (so USB3 is the bottleneck here, especially with random IO an EVO840 is much much faster than an EVO750 but here they perform identical due to the massive USB protocol overhead)
     
    Let's try both USB3 ports at the same time

    First quick try was a BTRFS RAID-0 made with 'mkfs.btrfs -f -m raid0 -d raid0 /dev/sda1 /dev/sdb1'. Please note that BTRFS is not the best choice here since all (over)writes with blocksizes lower than btrfs' internal blocksize (4K default) are way slower compared to non CoW filesystems:
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       1     2659     1680   189424   621860   435196     1663           102400       4    21943    18762    24206    24034    18107    17505           102400      16    41983    46379    62235    60665    52517    42925           102400     512   180106   170002   143494   149187   138185   180238           102400    1024   170757   185623   159296   156870   156869   179560           102400   16384   231366   247201   340649   351774   353245   231721 That's BS numbers, let's forget about them. Now trying the same with mdraid/ext4 configuring a RAID 0 and putting an ext4 on it and... N1 simply powered down when executing mkfs.ext4. Adding 'coherent_pool=2M' to bootargs seems to do the job (and I created the mdraid0 in between with both SSDs connected through SATA)
                                                                  random    random               kB  reclen    write  rewrite    read    reread    read     write           102400       4    25133    29444    38340    38490    23403    27947           102400      16    85036    97638   113992   114834    79505    95274           102400     512   306492   314124   295266   305411   289393   322493           102400    1024   344588   343012   322018   332545   316320   357040           102400   16384   384689   392707   371415   384741   388054   388908 Seems we're talking here already about one real bottleneck? We see nice improvements with small blocksizes which is an indication that RAID0 is doing its job. But with larger blocksizes we're not able to exceed the 400MB/s barrier so it seems both USB3 ports have to share bandwidth (comparable to the situation on ODROID XU4 where the two USB3 receptacles are connected to an internal USB3 hub which is connected to one USB3 port of the Exynos SoC)
     
    Edit: @Xalius used these results to look into RK3399 TRM (technical reference manual). Quoting ROCK64 IRC:
  14. Like
    tkaiser reacted to Icenowy in UAS   
    I didn't blacklist NS1066 because my NS1066 HDD case doesn't report UAS at all (with Ivy Bridge PC's USB3).
     
    P.S. NS106X will only report UAS at USB3 ports.
  15. Like
    tkaiser got a reaction from manuti in Kickstarter: Allwinner VPU support in the official Linux kernel   
    The first stretch goal is reached (that was the 'try to cover not only horribly outdated SoCs from half a decade ago but also some of the just outdated ones'). Two stretch goals are yet open and they estimate the same amount of work/money needed (another 22 man days or 22,000€ in total)
  16. Like
    tkaiser got a reaction from Magnets in ROCK64   
    Hmm... to summarize the 'OpenSSL 1.0.2g  1 Mar 2016' results for the 3 boards/SoC tested above with some more numbers added (on all A53 cores with crypto extensions enabled performance is directly proportional to CPU clockspeeds -- nice):
    ODROID N1 / RK3399 A72 @ 2.0GHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 377879.56k 864100.25k 1267985.24k 1412154.03k 1489756.16k aes-192-cbc 325844.85k 793977.30k 1063641.34k 1242280.28k 1312189.10k aes-256-cbc 270982.47k 721167.51k 992207.02k 1079193.94k 1122691.75k ODROID N1 / RK3399 A53 @ 1.5GHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 103350.94k 326209.49k 683714.13k 979303.08k 1118808.75k aes-192-cbc 98758.18k 291794.65k 565252.01k 759266.99k 843298.13k aes-256-cbc 96390.77k 273654.98k 495746.99k 638750.04k 696857.94k MacchiatoBin / ARMADA 8040 @ 1.3GHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 360791.31k 684250.01k 885927.34k 943325.18k 977362.94k aes-192-cbc 133711.13k 382607.98k 685033.56k 786573.31k 854780.59k aes-256-cbc 314631.74k 553833.58k 683859.97k 719003.99k 738915.67k Orange Pi One Plus / H6 @ 1800 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 226657.97k 606014.83k 1013054.98k 1259576.66k 1355773.27k aes-192-cbc 211655.34k 517779.82k 809443.75k 963041.96k 1019251.37k aes-256-cbc 202708.41k 470698.97k 692581.21k 802039.13k 840761.34k NanoPi Fire3 / Nexell S5P6818 @ 1400 MHz (4.14.40 64-bit kernel): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 96454.85k 303549.92k 637307.56k 909027.59k 1041484.46k aes-192-cbc 91930.59k 274220.78k 527673.43k 705704.40k 785708.37k aes-256-cbc 89652.23k 254797.65k 460436.75k 594723.84k 648388.61k ROCK64 / Rockchip RK3328 @ 1296 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 163161.40k 436259.80k 729289.90k 906723.33k 975929.34k aes-192-cbc 152362.85k 375675.22k 582690.99k 693259.95k 733563.56k aes-256-cbc 145928.50k 337163.26k 498586.20k 577371.48k 605145.77k PineBook / Allwinner A64 @ 1152 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 144995.37k 387488.51k 648090.20k 805775.36k 867464.53k aes-192-cbc 135053.95k 332235.56k 516605.95k 609853.78k 650671.45k aes-256-cbc 129690.99k 300415.98k 443108.44k 513158.49k 537903.10k Espressobin / Marvell Armada 3720 @ 1000 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 68509.24k 216097.11k 453277.35k 649243.99k 741862.06k aes-192-cbc 65462.17k 194529.30k 375030.70k 503817.22k 559303.34k aes-256-cbc 63905.67k 181436.03k 328664.06k 423431.51k 462012.42k OPi PC2 / Allwinner H5 @ 816 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 102568.41k 274205.76k 458456.23k 569923.58k 613422.42k aes-192-cbc 95781.66k 235775.72k 366295.72k 435745.79k 461294.25k aes-256-cbc 91725.44k 211677.08k 313433.77k 362907.31k 380482.90k Banana Pi R2 / MediaTek MT7623 @ 1040 MHz and MTK Crypto Engine active type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 519.15k 1784.13k 6315.78k 25199.27k 124499.22k aes-192-cbc 512.39k 1794.01k 6375.59k 25382.23k 118693.89k aes-256-cbc 508.30k 1795.05k 6339.93k 25042.60k 112943.10k MiQi / RK3288 @ 2000 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 87295.72k 94739.03k 98363.39k 99325.95k 99562.84k ODROID-HC1 / Samsung Exynos 5244 @ (A15 core @ 2000 MHz): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 78690.05k 89287.85k 94056.79k 95104.34k 95638.87k aes-192-cbc 69102.10k 77545.47k 81156.61k 81964.71k 82351.45k aes-256-cbc 61715.85k 68172.80k 71120.73k 71710.72k 72040.45k ODROID-C2 / Amlogic S905 @ 1752 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 51748.63k 59348.22k 62051.33k 62763.35k 62963.71k aes-192-cbc 46511.57k 52507.95k 54599.08k 55151.27k 55312.38k aes-256-cbc 42094.22k 46302.95k 47941.46k 48372.74k 48513.02k NanoPi M3 / Nexell S5P6818 @ 1400 MHz (3.4.39 32-bit kernel): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 44264.22k 54627.49k 58849.88k 59756.35k 60257.62k aes-192-cbc 39559.11k 47999.32k 51095.30k 51736.15k 52158.46k aes-256-cbc 35803.41k 42665.24k 44926.47k 45733.21k 45883.39k Clearfog Pro / Marvell Armada 38x @ 1600 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 47352.87k 54746.43k 57855.57k 58686.12k 58938.71k aes-192-cbc 41516.52k 47126.91k 49317.55k 49932.63k 50151.42k aes-256-cbc 36960.26k 41269.63k 43042.65k 43512.15k 43649.71k Raspberry Pi 3 / BCM2837 @ 1200 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 31186.04k 47189.70k 52744.87k 54331.73k 54799.02k aes-192-cbc 30170.93k 40512.11k 44541.35k 45672.11k 45992.62k aes-256-cbc 27073.50k 35401.37k 38504.70k 39369.39k 39616.51k Banana Pi M3 / Allwinner A83T @ 1800 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 36122.38k 43447.94k 45895.34k 46459.56k 46713.51k aes-192-cbc 32000.05k 37428.74k 39234.30k 39661.91k 39718.95k aes-256-cbc 28803.39k 33167.72k 34550.53k 34877.10k 35042.65k Banana Pi R2 / MediaTek MT7623 @ 1040 MHz: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 22082.67k 25522.92k 26626.22k 26912.77k 26995.37k aes-192-cbc 19340.79k 21932.39k 22739.54k 22932.82k 23008.60k aes-256-cbc 17379.62k 19425.11k 20058.03k 20223.66k 20267.01k Edit: Added results for Pinebook and ODROID-HC1 ensuring both were running at max cpufreq
     
    Edit 2: Added cpufreq settings for each tested device. Please note throttling dependencies and multi-threaded results below
     
    Edit 3: Added Banana Pi M3 single thread performance above. Performance with 8 threads sucks since A83T throttles down to 1.2GHz within 10 minutes and overall AES253 score is below 190000k.
     
    Edit 4: Added EspressoBin numbers from here. Another nice example for the efficiency of ARMv8 crypto extensions.
     
    Edit 5: Added NanoPi M3 numbers from there.
     
    Edit 6: Added Clearfog Pro numbers (Cortex-A9 -- unfortunately OpenSSL currently doesn't make use of CESA crypto engine otherwise numbers would be 3 to 4 times higher)
     
    Edit 7: Added Banana Pi R2 numbers from here (Cortex-A7, cpufreq scaling broken since ever so SoC only running with 1040 MHz, numbers might slightly improve once MTK manages to fix cpufreq scaling)
     
    Edit 8: Added numbers for ARMADA8040 (A72) from CNX comment thread.
     
    Edit 9: Added RK3288 (Cortex A17) numbers from here.
     
    Edit 10: Added RPI 3 (BCM2837) numbers. Please be aware that these are not Raspbian numbers but made with 64-bit kernel and Debian arm64 userland. When using Raspbian you get lower numbers!
     
    Edit 11: Added Allwinner H6 numers from here.
     
    Edit 12: Added RK3399 numbers from here.
     
    Edit 13: Added new S5P6818 numbers since now with mainline 64-bit kernel ARMv8 crypto extensions are available
  17. Like
    tkaiser got a reaction from manuti in Debian Builds for Orange Pi Win in the pipeline?   
    No, there are a lot more but all of this doesn't really matter especially here in this forum. And the whole issue (people using this crappy tool called sysbench trying to benchmark hardware) is not even related to 32-bit vs. 64-bit but different compiler switches. Raspbian packages are built for the ARMv6 ISA so they can be executed on the horribly outdated single core RPis as well. Normal Debian/Ubuntu armhf packages are built for ARMv7 and you would need to switch to arm64 packages since only these packages are built with support for ARMv8 CPU cores (that's what's inside RPI 3 and 2 in the meantime).
     
    So comparing an OPi Win running an arm64 Ubuntu or Debian distro with an RPi 3 running any recent Arch, Gentoo, Fedora, OpenSuSE or even an arm64 Armbian (see my link -- I did this) you will see sysbench numbers that are pretty close. Numbers between the different distros will vary since the distro packages are built with different compiler versions and switches. And this is all the lousy sysbench tool in 'cpu test' mode is able to report since this whole test is just calculating prime numbers inside the CPU caches (and as soon as an ARMv8 CPU is allowed to run ARMv8 code this gets magnitudes faster). I don't know of a single real-world use case that would correlate with this pseudo benchmark (except of course if your job is calculating prime numbers, then you can rely on sysbench and if you're running on a RPi 3 should better stay away from Raspbian, DietPi and the other ARMv6 dinosaurs)
     
    But while sysbench is wrongly reporting an RPi 3 (or an OPi Win if you choose Xunlong's Raspbian images!) would be magnitudes slower than any of the recent ARMv8 boards with one specific workload the RPi 3 is really magnitudes slower (AES encryption -- think about VPN and disk encryption). Broadcom forgot to license ARMv8 crypto extensions so any other 64-bit (ARMv8) SBC is a better choice than RPi 3 if it's about AES (except ODROID-C2 and NanoPi K2 since their Amlogic S905 suffers from the same problem). See the numbers here: https://forum.armbian.com/topic/4583-rock64/?do=findComment&comment=37829 (OPi Win scores the same as Pinebook)
     
  18. Like
    tkaiser got a reaction from sgjava in ArmbianIO API proposal   
    See 8 posts above or https://tech.scargill.net/banana-pi-m2/#comment-27947
     
    Edit: When asking @jernej back then whether he's fine with his code being used in other projects IIRC he had no objections.
  19. Like
    tkaiser got a reaction from StuxNet in ArmbianIO API proposal   
    I really hope ArmbianIO spreads widely. And relying alternatively on /proc/device-tree/* might help with user adoption. For example once DietPi users start to use ArmbianIO it could be surprising that ArmbianIO only works on approx half of the boards DietPi 'supports' (since DietPi relies on Debian OS images found somewhere or uses Armbian's build system to create crippled Armbian images with the DietPi menu stuff on top sold then as 'DietPi' to their users -- so if their OS images started as Armbian there will be /var/run/machine.id... otherwise not).
     
    BTW: In Armbian for all the Allwinner boards that run legacy kernel we tried to use exactly the same string as /proc/device-tree/model to let this method work regardless of legacy or mainline kernel. Since other projects out there (H3Droid, RetrOrangePi, Lakka) also use our fex files they should become compatible to this 'other fallback' too (at least that was my intention behind these adjustments made a while ago). Some details: https://tech.scargill.net/banana-pi-m2/#comment-27947
     
     
  20. Like
    tkaiser got a reaction from Naguissa in Support of Raspberry Pi   
    Exactly. The average RPi user often has really no clue at all why he bought an SBC (and you should please stop spreading this 'charity/education' BS since 'Pi for education' reality looks different). Users choosing any of the alternatives did at least some research and thought about what they want to achieve and how. And this (the user base) is the real reason why Armbian should never start to support Raspberries since once these people start to arrive here in the forum Armbian is finally dead (but maybe the current approach to semi-support as much SBC as possible will already kill the project).
  21. Like
    tkaiser got a reaction from Tido in Support of Raspberry Pi   
    Exactly. The average RPi user often has really no clue at all why he bought an SBC (and you should please stop spreading this 'charity/education' BS since 'Pi for education' reality looks different). Users choosing any of the alternatives did at least some research and thought about what they want to achieve and how. And this (the user base) is the real reason why Armbian should never start to support Raspberries since once these people start to arrive here in the forum Armbian is finally dead (but maybe the current approach to semi-support as much SBC as possible will already kill the project).
  22. Like
    tkaiser got a reaction from Larry Bank in ArmbianIO API proposal   
    I really hope ArmbianIO spreads widely. And relying alternatively on /proc/device-tree/* might help with user adoption. For example once DietPi users start to use ArmbianIO it could be surprising that ArmbianIO only works on approx half of the boards DietPi 'supports' (since DietPi relies on Debian OS images found somewhere or uses Armbian's build system to create crippled Armbian images with the DietPi menu stuff on top sold then as 'DietPi' to their users -- so if their OS images started as Armbian there will be /var/run/machine.id... otherwise not).
     
    BTW: In Armbian for all the Allwinner boards that run legacy kernel we tried to use exactly the same string as /proc/device-tree/model to let this method work regardless of legacy or mainline kernel. Since other projects out there (H3Droid, RetrOrangePi, Lakka) also use our fex files they should become compatible to this 'other fallback' too (at least that was my intention behind these adjustments made a while ago). Some details: https://tech.scargill.net/banana-pi-m2/#comment-27947
     
     
  23. Like
    tkaiser reacted to Larry Bank in ArmbianIO API proposal   
    I'll explore fixing the board name situation using your suggestion. Another reason I shared ArmbianIO as open source was for other people to contribute. @sgjava has already made some significant contributions such as Python and Java wrappers for it.
  24. Like
    tkaiser reacted to Larry Bank in ArmbianIO API proposal   
    ArmbianIO is not a fork of WiringOP. I wrote ArmbianIO because of the terrible situation with WiringOP and WiringNP. Using the BCM numbering on Allwinner boards is understandable for RPI compatibility, but limits what you can do with non-BCM chips. I thought a fresh start which treats all boards as unique and allows more than 40-pins of GPIO header would be wiser than another "crutch" of a hacked up WiringPi copy. I have a wide variety of boards and ArmbianIO (even running on Raspberry Pi boards) allows a consistent way to work with GPIO/I2C/SPI. I know that when I hook an LED or switch to a GPIO pin, I can run the same code on any of my boards and connect it to the same header pin and it will work without modifying my code.
     
    chwe: splited from https://forum.armbian.com/topic/6197-hardware-line-is-missing-on-proccpuinfo/  cause I think it's better to keep this in this thread.
     
  25. Like
    tkaiser reacted to zador.blood.stained in Hardware line is missing on /proc/cpuinfo   
    Or, as I suggested before, read /proc/device-tree/compatible (null-separated array) which contains both the board name and SoC name, so even if the library doesn't support a new board it can at least support SoC specific pin map.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines