Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. Note: discussion started here: https://forum.armbian.com/topic/6650-toolchain-banana-pi-m2-zero-h3/?do=findComment&comment=50530 Igor, I was not talking about the board in question. I'm talking about something else like the last 18 months already: policies and processes and why this stuff is ignored (by you). And how we can end the horrible user experience with u-boot/kernel updates constantly breaking user installations. When part of 'adding a CSC board' is a patch that tries to modify drivers/mmc/sunxi_mmc.c in a general way that affects all our sunxi boards I consider this dangerous. Currently the patch does not work, without the (fixed) patch the resulting OS image will not work. If the patch will be 'fixed' it affects all other sunxi boards. Why is the whole stuff there in the first place? Especially since for whatever bizarre reasons we still have not something like a testing or beta branch to be able to test out stuff without breaking installations. Do you think https://github.com/armbian/build/blob/0df0209db90855355abab5326ef075f8feb89ba5/patch/u-boot/u-boot-sunxi/fix-sdcard-detect-bpi-m2z.patch would have been accepted by mainline u-boot folks as part of adding support for one single new board?
  2. Why has the board been added in the first place? Isn't this part of your commit a real problem? https://github.com/armbian/build/blob/0df0209db90855355abab5326ef075f8feb89ba5/patch/u-boot/u-boot-sunxi/fix-sdcard-detect-bpi-m2z.patch Without this ugly hack the Zero will not be able to boot anyway (see post #1 of this thread). The patch is expected to fail since the path definitions are wrong (v2017.09/drivers/mmc/sunxi_mmc.c and v2017.09-bpi-m2z/drivers/mmc/sunxi_mmc.c instead of drivers/mmc/sunxi_mmc.c). So it simply will not result in anyone being able to produce Armbian images for the board using the build system unless the patch is 'fixed'. Now imagine a Zero user figures out how to 'fix' this and then sends his modified code (probably breaking u-boot for every other sunxi device?) as PR and you or someone else merges it for whatever reasons... We still use no testing branch, we still have neither PR/patch policies nor discussions, the attempts to establish a 'Board Bring Up' policy get ignored. But we constantly have an awful lot of trouble with u-boot and kernel updates breaking installations. Why?
  3. Just a quick note: this sys_config.fex stuff is how Allwinner does hardware description since years and still these days even if the kernels they use now use something different called 'device tree'. So starting with their kernel 3.10 sys_config.fex is processed by some Allwinner tool to create a .dts and .dtb file (do a web search for 'A64 dev tree&sysconfig使用文档.pdf' to get 'nice' PDF describing the process) In case you want to adjust pin configuration (eg. which pins are used to attach the external Gigabit Ethernet PHY -- I hope you know that the real Ethernet controller is part of the SoC) you need to either edit sys_config.fex and then pipe the file through Allwinner's converter or you search for the already created .dtb file, convert it back into .dts, adjust things and convert back to .dtb (requires a Linux installation and the dtc tool which in Debian/Ubuntu is part of the device-tree-compiler package) It would be great if a moderator could split off all this 'Using Android on H6 devices' stuff my post included into an own thread below 'peer to peer support' since it really doesn't fit well here in this thread. Wrt getting Ethernet to work you need to edit the right file where pin mappings are defined sometimes deal with proprietary hacks (at least this was the case with Pine64) need to take care about something called tx/rx delays that are board specific You'll find a lot of the related info when searching around these issues with Pine64 two years ago. While I was partially part of the process back then for obvious reasons I consider this just a waste of time like everyone else who went through this already (dealing with this Allwinner BSP stuff in general). So from a Linux / Armbian point of view we need the relevant stuff being mainlined (community work) before we could consider starting to support any H6 device. Based on Allwinner's BSP this for sure will not happen.
  4. For reasons unknown to me Igor decided 3 weeks ago to add this Zero as CSC config (I don't understand why). Those patches are also part of the build system but since the paths are bogus (v2017.09-bpi-m2z/drivers/mmc/) they don't work which I consider a feature and not a bug given the insane sh*t show we have here with Armbian and u-boot updates constantly breaking installations. https://github.com/armbian/build/commit/0df0209db90855355abab5326ef075f8feb89ba5#diff-38c994f09106d3b8d9bc91cd2e61a7a8 This was just a hint that BPi people for their own needs to provide fake Armbian images that harm our reputation use an ugly patch for SD card detection. Sorry, can't help further... but maybe others jump in and explain how to fix the patch (which might break then all other sunxi boards?)
  5. https://github.com/BPI-SINOVOIP/BPI-files/tree/master/others/armbian/build/patch/u-boot/u-boot-sunxi
  6. Please be careful since this is an entirely different workload compared to 'using SD cards for the rootfs' since storing images and video is more ore less only sequential IO coming with a write amplification close to the optimum: 1. This means that the amount of data written at the filesystem layer, the block device layer and the flash layer are almost identical. With our use cases when running a Linux installation on the SD card this looks totally different since the majoritiy of writes are of very small sizes and write amplification with such small chunks of data is way higher which can result in 1 byte changed at the filesystem or block device layer generating a whole 8K write at the flash layer (so we have a worst case 1:8192 ratio of 'data that has changed' vs. 'real writes to flash cells') Please see here for the full details why this is important, how it matters and what it affects: https://en.wikipedia.org/wiki/Write_amplification Our most basic take on that in Armbian happens at the filesystem layer due to mount options since we use 'noatime,nodiratime,commit=600' by default. What do they do? noatime prevents the filesystem generating writes when data is only accessed (default is that access times are logged in filesystem metadata which leads to updated filesystem structures and therefore unnecessary writes all the time filesystem objects are only read) nodiratime is the same for directories (not that relevant though) commit=600 is the most important one since this tells the filesystem to flush changes back to disk/card only every 600 seconds (10 min) Increasing the commit interval from the default 5 to 600 seconds results in the majority of writes waiting in DRAM to be flushed to disk only every 10 minutes. Those changes sit in the Page Cache (see here for the basics) and add as so called 'dirty pages'. So the amount of dirty pages increases every 10 minutes to be set to 0 after flushing the changes to disk. Can be watched nicely with monitoring tools or something simple as: watch -n 5 grep Dirty /proc/meminfo While @Fourdee tries to explain 'dirty pages' would be something bad or even an indication for degraded performance it's exactly the opposite and just how Linux basics work with a tunable set for a specific use case (rootfs on SD card). To elaborate on the effect: let's think about small changes affecting only 20 byte of change every minute. With filesystem defaults (commit interval = 5 seconds) this will result in 80KB written within 10 minutes (each write affects at least a whole flash page and that's AFAIK 8K or 16K on most cards so at least 8K * 10) while with a 10 minute commit interval only 8KB will be written. Ten times less wear. But unfortunately it's even worse with installations where users run off low capacity SD cards. To my knowledge in Linux we still have no TRIM functionality with MMC storage (SD cards, eMMC) so once the total amount of data written to the card exceeds its native capacity the card controller has no clue how to distinguish between free and occupied space and has therefore to start to delete (there's no overwrite with flash storage, see for example this explanation). So all new writes now might even affect not just pages but whole so called 'Erase Blocks' that might be much larger (4MB or 16MB for example on all the cards I use). This is for example explained here. In such a case (amount of writes exceed card's native capacity) we're now talking about writes affecting Erase Blocks that might be 4MB in size. With the above example of changing 20 bytes every minute with the default commit interval of 5 seconds at the flash layer now even 40 MB would be written while with a 10 min commit interval it's 4MB (all with just 200 bytes having changed in reality). So if you really care about the longevity of your card you buy good cards with capacities much 'larger than needed', clone them from time to time to another card and now perform a TRIM operation manually by using your PC or Mac and SD Association's 'SD Formatter' to do a quick erase there. This will send ERASE (CMD38) for all flash pages to the card's controller which now treats all pages as really empty so new writes to the card from now on do NOT generate handling of whole Erase Blocks but happen at the page size level again (until the card's capacity is fully used, then you would need to repeat the process). There's a downside with an increased commit interval as usual and that affects unsafe power-offs / crashes. Everything that sits in the Page Cache and is not already flushed back to disk/card is lost in case a power loss occurs or something similar. On the bright side this higher commit interval makes it less likely that you run into filesystem corruption since filesystem structures are updated on disk also only every 10 minutes. Besides that we try to cache other stuff in RAM as much as possible (eg. browser caches and user profiles using 'profile sync daemon') and same goes for log files which are amongst those candidates that show worst write amplification possible when allowing to update logs every few seconds on 'disk' (unfortunately we can't throw logs just away as for example DietPi does it by default so we have to fiddle around with stuff like our log2ram implementation showing lots of room for improvements)
  7. I would call the price/performance not good but simply awesome today given we get ultra performant cards like the 32GB SanDisk Ultra A1 for as low as 12 bucks currently: https://www.amazon.com/Sandisk-Ultra-Micro-UHS-I-Adapter/dp/B073JWXGNT/ (I got mine 2 weeks ago for 13€ at a local shop though). And the A1 logo is important since cards compliant to A1 performance class perform magnitudes faster with random IO and small blocksizes (which pretty much describes the majority of IO happening with Linux on our boards). As can be seen in my '2018 A1 SD card performance update' the random IO performance at small blocksizes is magnitudes better compared to an average/old/slow/bad SD card with low capacity: average 4GB card SanDisk Ultra A1 1K read 1854 3171 4K read 1595 2791 16K read 603 1777 1K write 32 456 4K write 35 843 16K write 2 548 With pretty common writes at 4K block size the A1 SanDisk shows 843 vs. 35 IOPS (IO operations per second) and with 16K writes it's 548 vs. 2 IOPS. So that's over 20 or even 250 times faster (I don't know the reason but so far all average SD cards I tested with up to 8 GB capacity show this same weird 16KB random write bottleneck -- even those normal SanDisk Ultra with just 8GB). This might be one of the reasons why 'common knowledge' amongst SBC users seems to be trying to prevent writing to SD card at all. Since the majority doesn't take care which SD cards they use, test them wrongly (looking at irrelevant sequential transfer speeds instead of random IO and IOPS) and chose therefore pretty crappy ones. BTW: the smallest A1 rated cards available start with 16GB capacity. But for obvious reasons I would better buy those with 32GB or even 64GB: price/performance ratio is much better and it should be common knowledge that buying larger cards 'than needed' leads to SD cards wearing out later.
  8. Well, as already explained above: Since Armbian is a build system everyone really keen on using a really small installation media can do (512 MB is not a great idea but possible). Simply by switching from ext4 to btrfs and deleting the linux-firmware package from default package list. When choosing btrfs rootfs size requirement drops down to ~60% since we use zlib compression at image creation time: https://github.com/armbian/build/commit/b14da27a4181e8e232bd8f526e71d2a931a8252f#commitcomment-19484855 When leaving out the linux-firmware package 200 MB less garbage get installed (no idea why this package is included in the first place but I stopped to join any firmware related discussions already since too frustrating, it seems it's up to @Igor to decide about anything firmware related) Using btrfs works quite well (all the thousands of Armbian based OMV installations run this way) and we support the same when transferring Armbian from SD card to eMMC or a disk with nand-sata-install.
  9. Oh, I'm also very much interested in reliable operation and one of the key aspects with SBC is reducing wear on flash media like SD cards since at least I (being a low consumption fetishist) want to put the rootfs on energy efficient flash storage and not more 'wasting' SSD or HDD storage. @Fourdee's 'Why DietPi is better' list http://dietpi.com/phpbb/viewtopic.php?f=9&t=2794 shows at least one very dangerous misunderstanding about Linux' VM / page cache implementation but at least that explains his strange conclusions in his Tests 1) and 4). While this is only dangerous for DietPi users it's a bit tough not trying to explain why 'dirty pages' do NOT 'show wait times, before the requested memory data can be written to disk. This generally means bottle-necking has occurred and the overall system performance will be effected until the data is written'. In contrast dirty pages are exactly what we want to reduce wear on flash memory and result of our way larger commit interval at the fs / block device layer. Same with the comparison of the log attempts... once DietPi users switch to full logging (rsyslog + logrotate) their flash media is at risk while it's possible to combine rsyslog + logrotate + logs in RAM. Speaking of that: Do you have an opinion on improving log2ram? https://forum.armbian.com/topic/6626-var-easily-get-full-with-log2ram/?tab=comments#comment-50286
  10. Not really since I tested in the beginning with a Seagate Barracuda that shows an insanely high standby/sleep consumption. I updated the 1st post with this...
  11. Nope. The problem is that such a SATA controller (regardless which highspeed bus between CPU and the controller is used... read as: this also applies to PCIe SATA controllers) usually has 2 high-speed PHYs active: USB3 or PCIe PHY to host SATA PHY to HDD/SSD Each PHY consumes usually up to half a W so we can easily do the math. Sometimes it's possible to get slight consumption savings by slowing down interfaces (in your case switching from SuperSpeed to Hi-Speed or with SATA controllers from 6Gbps to 3 or just 1.5) but it's not always possible to control the behaviour (at least with USB attached SATA, with PCIe controllers it's usually just telling the kernel driver what to do) Edit: Just found it. Copy&paste from my report to Cloudmedia/Pine folks about the above issue with GPIO 2 not able to toggle from userspace any more: 'Between hdparm -y and setting the sysfs node to 1 there's a consumption difference of 1.2W in my setup (and the SSD used has a standby consumption of 0.4W so the remaining 800mW are in my case the ASM1153 bridge inside the external USB enclosure). 1.2W less in idle (or 1.8W vs 3.0W) can be considered a huge difference :)'
  12. You can't with current Rock64 releases since ayufan defined the GPIO pin in device-tree and so it's not accessible any more from userspace. With his older images (0.5.1 and before) the following was possible: GPIO=2 USB_Power_Node=/sys/class/gpio/gpio${GPIO} [ -d ${USB_Power_Node} ] || echo ${GPIO} >/sys/class/gpio/export echo out >${USB_Power_Node}/direction hdparm -y /dev/sda sleep 2 echo 1 > ${USB_Power_Node}/value BTW: what I didn't know back then is that this GPIO controls power to all USB ports (USB3 included) but the way the pin currently is defined it's not possible to toggle power to the USB ports any more.
  13. EspressoBin: https://forum.armbian.com/topic/6371-espressobin-wont-boot-after-upgrade-to-538/ ODROID-C2: https://forum.armbian.com/topic/6336-odroid-c2-cannot-start-with-u-boot-default-538/ sunxi/eMMC/bootscript: https://forum.armbian.com/topic/6299-problems-with-538-update/ sunxi performance: https://forum.armbian.com/topic/6386-lime2-mainline-kernel-with-debian-9-stretch-becomes-unresponsive-forced-reboot-required/ ODROID XU4/HC1/HC2: https://forum.armbian.com/topic/6299-problems-with-538-update/?do=findComment&comment=48200 How can we avoid this happening again?
  14. Nope, you're right and apologies. While I tried hard to focus on 'doing it better' you nailed it that I was annoyed by being confrontated again with bizarre 'performance metrics' like those from their Twitter conversation. Trying to calm down now and focus on our problems (the many 5.38 update issues being almost enough for me to throw the towel)
  15. I was more talking about bundling all this compiler stuff with default installations. Just a quick check which packages consume most space as already done a year ago: Edit: This is how it looks like with DietPi on Rock64:
  16. Oh, just found this: http://dietpi.com/phpbb/viewtopic.php?f=9&t=2794 (archived version) -- this is a really funny compilation
  17. The nice dashboard screenshot above is used by @Fourdee to explain why DietPi is superiour to Armbian: 'With #DietPi, logs and DietPi scripts are mounted to RAM , this reduces SD card write operations vastly' -- while I don't understand the purpose to 'mount scripts to RAM' of course the idea to cache logs into RAM is great! That's why Armbian does it since 2014 already. While the above 'proof' is somewhat questionable (watching a 5 min period in a dashboard and once there's activity in one graph taking a screenshot with numbers without meaning) let's look into what makes DietPi that superiour compared to Armbian since it's always a great idea to improve even if that means taking over other project's USPs. For whatever reasons DietPi dropped support for all Orange and Banana Pis recently (seems this started with a conversation between @Igor and @Fourdee on Twitter, then continued here and ended up there) so I had to take another board to do a direct comparison. The only boards that are supported by both projects are now Pine64, Rock64, Tinkerboard, some NanoPi and the ODROIDs. I chose Rock64 mostly to ensure that we use same kernel and almost same settings (Armbian's philosophy is to fix as much as possible upstream so our usual performance fixes went into ayufan's Rock64 build scripts DietPi in this case is relying on by accident so even DietPi users can continue to benefit from our work ) I took latest official DietPi image for Rock64 and the first surprise was the rootfs being pretty small and entirely full so no way to proceed: /dev/mmcblk1p7 466M 453M 0 100% / For whatever reasons DietPi chose to overtake ayufan's partition layout (for users new to DietPi: this is always just someone else's Debian image processed manually and by some scripts until it becames 'DietPi') but their 'dietpi-drive_manager' responsible to resize the rootfs seems not to be able to cope with this (I wanted to report it to DietPi but there's already a report that gets ignored and it seems I can't comment there). Edit: Ah, it seems @Fourdee blocked me from helping them entirely. I wanted to assist DietPi folks over at https://github.com/Fourdee/DietPi/issues/1550 but can't point them to fix the thermal issues they're running into again or why it's a bit weird to reintroduce the 'rootmydevice' issue again or why the new Allwinner BSP code is not such a great idea due to non-existing dvfs/thermal support Fortunately our scripts below /usr/local/sbin/ were not deleted by DietPi so I simply called /usr/local/sbin/resize_rootfs.sh which instantly resized the rootfs partition and was then able to continue. For whatever reasons it took 3 whole reboots to get DietPi upgraded to their latest version v6.2 but then I was able to do so some measurements: I then downloaded our Rock64 nightly image (based on Ubuntu Xenial but that doesn't matter that much -- as we all know the userland stuff is close to irrelevant since kernel and settings matter) and did the same thing. But no reboot needed since for whatever reasons DietPi remained on pretty outdated 4.4.77 kernel so I chose to not update Armbian's kernel to our 4.4.115 but to remain at 4.4.77 too: Let's look at the results leaving aside the various performance and security issues DietPi suffers from since not relevant if we want to look at stuff where DietPi outperforms Armbian. First 'idle behaviour': DietPi Armbian DRAM used: 39 MB (2%) 44 MB (2%) processes: 120 134 cpufreq lowest: 97.5% 99.8% cpufreq highest: 2.0% 0.1% idle temp: 46°C 43.5°C %idle percent: 99.95% 99.98% So we're talking more or less about identical numbers. 'Used' memory after booting is 2% of the available 2GB (anyone thinking 'free' RAM would be desirable on Linux... please try to educate yourself: https://www.linuxatemyram.com), the count of processes reported by ps is almost the same, cpufreq behaviour, %idle percentage and temperatures are also the same (DietPi temperature readout is somewhat flawed since their 'cpu' tool affects system behaviour negatively). Even if Armbian ships with almost twice as much packages installed by default the process count doesn't differ that much (and idling processes really don't hurt anyway) and used memory after booting also doesn't differ significantly. But this 'boot and sit there in idle' use case isn't that relevant anyway and in situations when RAM is really needed I would assume Armbian users are in a much better position since we ship with zram active allowed to use half of the physical DRAM (see here for a brief introduction to zram). So far I don't see that much advantages (none to be honest) but most probably I missed something? Anyway: let's continue focussing on storage utilization and 'use': DietPi Armbian size img.7z: 104 MB 223 MB (x 2.1) size img: 668 MB 1.6 GB (x 2.5) rootfs size: 457 MB 1.2 GB (x 2.7) packages: 229 436 (x 1.9) commit interval: 5 s 600 s kB_wrtn: 156 KB 448 KB (x 2.9) kB_read: 1008 KB 5912 KB (x 5.9) So both compressed and uncompressed image sizes are much larger with Armbian, same goes for used space on the rootfs which is understandable given that Armbian does not try to be as minimalistic as possible (see the count of pre-installed packages). I don't think going minimalistic is something desirable though we could think about removing development related packages from default installations as @zador.blood.stained suggested already. Maybe it's worth to adjust the rootfs partition size calculation to use slightly less so the uncompressed image size can be a little bit smaller? Anyway: for people being concerned about smallest image size possible even without leaving out packages from default install simply building an own image and then switching from ext4 to btrfs does the job since reducing image size to around ~60% (one of Armbian's advantages is that our images are not hand-crafted unique 'gems' but the fully automated result of our build system so everyone on this earth can simply build his own Armbian images suiting his own needs). And besides that I really see no benefit in trying to get the rootfs size smaller since we surely don't want to start to encourage users to write Armbian images to old and crappy SD cards with less than 4GB size (though I already consider 4GB cards nothing anyone should use these days since almost all those cards are insanely slow). Let's better continue to educate our users about the importance to choose good and reliable SD cards! Now looking at the last 3 lines above. I executed an 'iostat -y 3600' to query the kernel about the total amount of data read and written at the block device layer. within one whole hour With DietPi/Stretch 156KB/1008KB (write/read) were reported and with Armbian/Xenial 448KB/5912KB (write/read). All numbers are too low for further investigations though something is worth a look: that's the default rootfs 'commit interval.' DietPi seems to use ext4 defaults (sync every 5 seconds to SD card) while in Armbian we choose a somewhat high 10 minute value (commit=600). So while with Armbian and 448 KB written in one hour almost three times as much data has been written at the block device layer it might be possible that the 156 KB written by the DietPi installation caused more wear at the flash layer below due to a phenomenon called Write Amplification (TL;DR version: writes at the flash layer happen at 'page sizes', usually 8K, and by using a high commit interval somewhat larger data chunks will be written only every few minutes which can result in significantly less page writes at the flash layer compared to writing every few seconds smaller chunks of data. Adding to the problem once a card is 'full' now we're talking about much higher Write Amplification since now not just pages are written but usually whole Erase Blocks are affected that are much larger. So please choose your SD card wisely and always use a much larger capacity than needed since there's no TRIM with SD cards in Linux!) It would need a lot of more detailled analysis about this write behaviour but IMO it's not worth the efforts and Armbian's 10 min commit interval does a great job reducing further SD card wearout (anyone with too much spare time? Grab 'iostat 5' and 'iotop -o -b -d5 -q -t -k | grep -v Total' and start to analyse what's happening at the block device and application layer forgetting about the filesystem layer in between!) So where's some room for improvement when comparing our defaults with DietPi's? Maybe removing development related packages from default package list? Maybe tuning rootfs partition creation to use slightly less space? Mostly unrelated but an issue: improving our log2ram behaviour as already discussed?
  18. Well, it's the distro'ss decision to use this log2ram mechanism and to stay with defaults that in the meantime look somewhat problematic to me. We're syncing at boot the whole /var/log contents including rotated/compressed logs back into RAM: https://github.com/armbian/build/blob/master/packages/bsp/common/usr/sbin/log2ram#L40-L44 If we would exclude rotated/compressed logs they would exist afterwards only in /var/log.hdd and missing in /var/log which might be confusing for users and break log analysis software and so on. Using symlinks might work but opens up another can of worms. Isn't there an other option that works somewhat like an overlayfs? @zador.blood.stained IIRC we already talked about this a year ago but I can't remember the details Edit: quick google check: https://github.com/azlux/log2ram/commit/e88f67ab23a91bb1482f0f2063b990585b27730c
  19. In case 'TK' is a reference to me: sorry, I'm neither on twitter nor member of any marketing department.
  20. USB3 anomalies / problems When I tested this almost 2 weeks ago I did not pay attention close enough to the crappy write performance: 470 MB/s with 4 SSDs in parallel attached to all SATA and USB3 ports is just horribly low given that we have a 'per port' and a 'per port group' limitation of around 390 MB/s. What we should've seen is +650 MB/s taking the overhead into account. But 470 MB/s was already an indication that there's something wrong. Fortunately in the meantime an ODROID community member tested various mirror attemps with 2 Seagate USB3 disks, reported 'RAID 0 doubles disk IO' while in reality showing exactly the opposite: None of his three mirror attempts (mdraid, lvm and btrfs) reported write performance exceeding 50 MB/s which is insanely low for a RAID0 made out of two 3.5" disks (such lousy numbers are usually not even possible with 2 USB2 disks on separate USB2 ports). So let's take a look again: EVO840 and EVO750 both in JMS567 enclosures connected to each USB3 port. I simply created an mdraid RAID0 and measured sequential performance with 'taskset -c 5 iozone -e -I -a -s 500M -r 16384k -i 0 -i 1': kB reclen write rewrite read reread 512000 16384 85367 85179 312532 315012 Yep, there's something seriously wrong when accessing two USB3 disks in parallel. Only 85 MB/s write and 310 MB/s read is way too low especially for rather fast SSDs. 'iostat 1' output shows that each disk when writing remains at ~83 tps (transactions per second): https://pastebin.com/CvgA3ggQ Ok, let's try to get a clue what's bottlenecking. I removed the RAID0 and formatted both SSDs as ext4. First tests with only one SSD active at a time: kB reclen write rewrite read reread EVO840 512000 16384 378665 382100 388932 392917 EVO750 512000 16384 386473 385902 377608 383549 Now trying to start the iozone runs at the same time (of course iozone tasks sent to different CPU cores to avoid CPU bottlenecks, same applies to IRQs: that's /proc/interrupts after test execution): kB reclen write rewrite read reread EVO840 512000 16384 243482 215862 192638 160677 EVO750 512000 16384 214356 252474 168322 195164 So there is still some sort of a limitation but at least it's not as severe as in the mirror modes when all accesses to the two USB connected disks happen exactly in parallel. When looking closer we see another USB3 problem long known from N1's little sibling ROCK64 (any RK3328 device is a much nearer relative to N1 than any of the other ODROIDs): [ 3.433165] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 3.433183] xhci-hcd xhci-hcd.7.auto: @00000000efc59440 00000000 00000000 1b000000 01078001 [ 3.441152] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 3.441171] xhci-hcd xhci-hcd.8.auto: @00000000efc7e440 00000000 00000000 1b000000 01078001 [ 11.363314] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 11.376118] xhci-hcd xhci-hcd.7.auto: @00000000efc59e30 00000000 00000000 1b000000 01078001 [ 11.385567] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 11.395145] xhci-hcd xhci-hcd.8.auto: @00000000efc7ec30 00000000 00000000 1b000000 01078000 [ 465.710783] usb 8-1: new SuperSpeed USB device number 3 using xhci-hcd [ 465.807944] xhci-hcd xhci-hcd.8.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 465.817503] xhci-hcd xhci-hcd.8.auto: @00000000efc7ea90 00000000 00000000 1b000000 01078001 [ 468.601895] usb 6-1: new SuperSpeed USB device number 3 using xhci-hcd [ 468.671876] xhci-hcd xhci-hcd.7.auto: ERROR Transfer event for disabled endpoint or incorrect stream ring [ 468.671881] xhci-hcd xhci-hcd.7.auto: @00000000efc591f0 00000000 00000000 1b000000 01078001 I updated bootloader and kernel this morning and have no idea whether this was introduced (again?) just recently or existed already before: root@odroid:~# dpkg -l | egrep "odroid|bootini" ii bootini 20180226-8 arm64 boot.ini and its relatives for ODROID-N1 ii linux-odroidn1 4.4.112-16 arm64 Linux kernel for ODROID-N1 But I guess we're still talking about a lot of room for improvements when it's about XHCI/USB3, BSP kernel and RK3399 Edit: Strangely when I tested with USB3 when I received the N1 two weeks ago the RAID0 results weren't that low. Now I remembered what happened back then: I immediately discovered coherent pool size being too low and increased that to 2MB (gets removed every time the 'bootini' package will be updated). And guess what: that does the trick. I added 'coherent_pool=2M' to kernel cmdline and we're back at normal performance though there's still a ~390 MB/s overall limitation.
  21. Just a miniature SATA/ASM1061 related material collection multiple disks behind ASM1061 problem with Turris Omnia Suggested 'fix' by Turris folks (slowing down PCIe): https://gitlab.labs.nic.cz/turris/turris-os-packages/merge_requests/48/diffs -- please note that the ASM106x firmware matters, their ASM1061 registers itself as class '0x010601' (AHCI 1.0) while the ASM1061 Hardkernel put on the N1 dev samples uses a firmware that reports class '0x010185' (IDE 1.0) instead. Doesn't matter wrt performance since there the chosen driver is important but if code wants to differentiate based on PCIe device classes this of course has to match. Same with device ids: can be either '0x0611' (ASM1061) or '0x0612' (ASM1062) based on firmware and not hardware (the Turris ASM1061 shows up as ASM1062). To disable NCQ and/or to set link speed negotation limits you could adjust the 'setenv bootargs' line in /media/boot/boot.ini: for example setenv bootargs "${bootrootfs} libata.force=1.5,noncq" (see kernel cmdline parameters, could be interesting for SSD users in case NCQ and TRIM interfere) To check SATA relevant dmesg output: dmesg | egrep -i "ahci|sata| ata|scsi|ncq" (mandatory prior and after any benchmarks!) There's a newer firmware for the ASM1061 available -- to be able to use the included binary it would need a few steps but even then the update operation fails: dpkg --add-architecture armhf ; apt install binutils:armhf ; ./106flash ahci420g.rom (Hardkernel put a SPI flash for ASM1061 on the PCB but the flash program stops with 'ASM106X SPI Flash ROM Write Linux V2.6.4 • Find 1 ASM106X Controller • Read_RomID Failed!!')
  22. No, you're using a crippled Armbian image called DietPi. Support is over there: http://dietpi.com/phpbb/index.php
  23. Just for the record: Banana people work on another MediaTek based board: https://github.com/BPI-SINOVOIP/BPI-files/commit/a3c53c233fd2059a43763a78b13ca1c5fd0b0f50 SoC is a MT7622A (dual core ARM Cortex A53 processor with some 'dedicated network accelerator', RAID/XOR engine, SATA and PCIe 2.0), latest bootloader commit suggests that the board will be equipped with 802.11ac (AC2600) Wi-Fi.
  24. Storage performance update... what to use to store the rootfs on? In the following I compare 4 good SD cards with 4 different eMMC modules Hardkernel sells for the N1 with 4 different SSD setups. As some background why I chose to measure random IO with 1k, 4k and 16k block sizes please read the 'SD card performance 2018 update' first. The following are IOPS numbers (IO operations per second) and important if we want to know how fast storage performs when used as an 'OS drive' (random IO performance is the most important factor here): 1K w/r 4K w/r 16K w/r SanDisk Extreme Plus 16GB 566 2998 731 2738 557 2037 SanDisk Ultra A1 32GB 456 3171 843 2791 548 1777 SanDisk Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Orange eMMC 16GB 2450 7344 7093 7243 2968 5038 Orange eMMC 32GB 2568 7453 7365 7463 5682 5203 Orange eMMC 64GB 2489 7316 7950 6944 6059 5250 Orange eMMC 128GB 2498 8337 7064 7197 5459 4909 Intel 540 USB3 7076 4732 7053 4785 5342 3294 Samsung EVO750 USB3 8043 6245 7622 5421 6175 4481 Samsung EVO840 powersave 8167 5627 7605 5720 5973 4766 Samsung EVO840 performance 18742 10471 16156 9657 10390 7188 The SD cards I chose for this comparison all perform very well (an average no-name, Kingston, PNY, Verbatim or whatever other 'reputable' brand performs way lower wrt random IO!). But it can be clearly seen that Hardkernel's eMMC modules are a lot more performant. Regardless of size they all perform pretty similar though the small 16GB module being bottlenecked due to a write performance limitation that also affects 16k random IO write performance. With SSDs it depends: I chose somewhat ok-ish consumer SSDs for the test so in case you want to buy used SSDs or some 'great bargains' on Aliexpress or eBay be prepared that your numbers will look way worse. The SATA connected EVO840 is listed two times since performance with small blocksizes heavily depends on PCIe power management settings (default is powersave -- switching to performance increases idle consumption by around ~250mW but only then a SATA connected SSD is able to outperform Hardkernel's eMMC. That's important to know and also only applies to really performant SSDs. Cheap SSDs especially with small capacities perform way lower) Now let's look at sequential performance with large blocksizes (something that does NOT represent the 'OS drive' use case even remotely and is pretty irrelevant for almost all use cases except creation of stupid benchmark graphs): MB/s write MB/s read SanDisk Extreme Plus 16GB 63 67 SanDisk Ultra A1 32GB 20 66 SanDisk Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Orange eMMC 16GB 48 298 Orange eMMC 32GB 133 252 Orange eMMC 64GB 148 306 Orange eMMC 128GB 148 302 Intel 540 USB3 325 370 Samsung EVO750 USB3 400 395 Samsung EVO840 powersave 375 385 Samsung EVO840 performance 375 385 We can see that N1's SD card interface seems to bottleneck sequential read performance of all tested cards to around ~67 MB/s. Write performance depends mostly on the cards (all cheap cards like the tested SanDisk Ultra A1 32GB you get currently for $12 on Amazon are limited here). The Hardkernel eMMC modules perform very well with sustained read performance at around 300MB/s and write performance depending on module size at up to ~150 MB/s. With SSDs it depends -- we have an interface limitation of around ~395 MB/s on the USB3 ports and a little bit lower on the SATA ports but unless you buy rather expensive SSDs you won't be able to reach the board's bottleneck anyway. Please also keep in mind that the vast majority of consumer SSDs implements some sort of write caching and write performance drops down drastically once a certain amount of data is written (my Intel 540 get's then as slow as 60MB/s, IIRC the EVO750 can achieve ~150 MB/s and the EVO840 180 MB/s). Why aren't HDDs listed above? Since useless. Even Enterprise HDDs show random IO performance way too low. These things are good to store 'cold data' on it but never ever put your rootfs on them. They're outperformed by at least 5 times by any recent A1 rated SD card, even crappy SSDs are at least 10 times faster and Hardkernel's eMMC performs at least 50 times better. So how to interpret results above? If you want energy efficient and ok-ish performing storage for your rootfs (OS drive) then choose any of the currently available A1 rated SD cards from reputable vendors (choose more expensive ones for better performance/resilience, choose larger capacities than needed if you fear your flash memory wearing out too fast). If you want top performance at lowest consumption level choose Hardkernel's eMMC and keep in mind that the smallest module is somewhat write performance bottlenecked. Again: if you fear your flash memory wearing out too fast simply choose larger capacities than 'needed'. If you want to waste huge amounts of energy while still being outperformed by Hardkernel's eMMC buy a cheap SSD. Keep in mind that you need to disable PCIe powermanagement further increasing idle consumption to be able to outperform eMMC storage otherwise N1's SATA/PCIe implementation will bottleneck too much. So when do SSDs start to make sense? If you either really need higher performance than Hardkernel's eMMC modules and are willing to spend some serious amount of money for a good SSD or the '1k random IO' use case really applies to you (e.g. trying to run a database with insanely small record sizes that constantly updates at the storage layer). But always keep in mind: if you not really choose a more expensive and high performing SSD you'll always get lower performance than eMMC while consumption is at least 100 times higher. And always use SSDs at the SATA ports since only there you can get higher random IO performance compared to eMMC and being able to benefit from TRIM is essential (for details why TRIM is a problem on USB ports see above). But keep in mind that internal SATA ports are rated for 50 matings max so be prepared to destroy connectors easily when you permanently change cables on those SATA ports But what if you feel that any SATA attached storage (the cheapest SSD around and even HDDs) must be an improvement compared to eMMC or SD cards? Just use it, all of the above is about facts and not feelings. You should only ensure to never ever test your storage performance since that might hurt your feelings (it would be as easy as 'cd $ssd-mountpoint ; iozone -e -I -a -s 100M -r 1k -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2' but really don't do this if you want to believe in one of the most common misbeliefs with consumer electronics today) As a reference all IO benchmark results for SD cards, Hardkernel's eMMC modules and the SSD tests: https://pastebin.com/2wxPWcWr https://pastebin.com/ePUCXyg6 https://pastebin.com/N5wEghn3
  25. 2018 SD card update It's 2018 now, SD Association's A1 'performance class' spec is out over a year now and in the meantime we can buy products trying to be compliant to this performance class. SD cards carrying the A1 logo must be able to perform at least 1500 random read input-output operations per second (IOPS) with 4KB block size, 500 random write IOPS and 10 MB/s sustained sequential performance (see here for more details and background info) Why is this important? Since what we do on SBC at least for the rootfs is mostly random IO and not sequential IO as it's common in cameras or video recorders (that's the stuff SD cards have been invented to be used with in the beginning). As an SBC (or Android) user we're mostly interested in high random IO performance with smaller blocksizes since this is how 'real world' IO patterns mostly look like. Prior to A1 and A2 performance classes there was no way to know how SD cards perform in this area prior to buying. Fortunately this has changed now. Last week arrived an ODROID N1 dev sample so I bought two SanDisk A1 cards with 32GB capacity each. An el cheapo 'Ultra A1' for 13€ (~$15) and an 'Extreme A1' for 23€. I wanted to buy a slightly more expensive 'Extreme Plus A1' (since even more performance and especially reliability/longevity) but ordered the wrong one Please keep in mind that the 'Extreme Plus' numbers shown below are made with an older card missing the A1 logo. Let's look how these things perform, this time on a new platform: RK3399 with an SD card interface that supports higher speed modes (requires kernel support and switching between 3.3V to 1.8V at the hardware layer). So results aren't comparable with the numbers we generated the last two years in this and other threads but that's not important any more... see at the bottom. A1 conformance requires at least 10 MB/s sequential performance and 500/1500 (write/read) IOPS with 4K blocksize. I tested also with 1K and 16K blocksizes for the simple reason to get an idea whether 4K results are useful to determine performance with smaller or larger blocksizes (since we already know that the vast majority of cheap SD cards out there shows a severe 16K random write performance drop which is the real reason so many people consider all SD cards being crap from a performance point of view). I tested with 7 cards, 4 of them SanDisk, two Samsung and the 'Crappy card' being a results mixture of a 4GB Kingston I started to test with and old results from a 4GB Intenso from two years ago (see first post of this thread). The Kingston died when testing with 4K blocksize and the performance of all these crappy 'noname class' cards doesn't vary that much: 1K w/r 4K w/r 16K w/r Crappy card 4GB 32 1854 35 1595 2 603 Samsung EVO+ 128GB 141 1549 160 1471 579 1161 Ultra A1 32GB 456 3171 843 2791 548 1777 Extreme A1 32GB 833 3289 1507 3281 1126 2113 Samsung Pro 64GB 1091 4786 1124 3898 478 2296 Extreme Plus 16GB 566 2998 731 2738 557 2037 Extreme Pro 8GB 304 2779 323 2754 221 1821 (All results in IOPS --> IO operations per second) For A1 compliance we only need to look at the middle column and have to expect at least 500/1500 IOPS minimum here. The 'Crappy card' fails as expected, the Samsung EVO+ too (but we already knew that for whatever reasons newer EVO+ or those with larger capacity perform worse than the 32GB and 64GB variants we tested two years ago), the Samsung Pro shows the best performance here while one of the 4 SanDisk also fails. But my Extreme Pro 8GB is now 3 years old, the other one I had showed signs of data corruption few months ago and when testing 2 years ago (see 1st post in this thread) random write performance was at 800. So most probably this card is about to die soon and the numbers above are partially irrelevant.. What about sequential performance? Well, 'Crappy card' also not able to meet specs and all the better cards being 'bottlenecked' by ODROID N1 (some of these cards show 80 MB/s in my MacBook's card reader but Hardkernel chose to use some safety headroom for good reasons and limits the maximum speed for improved reliability) MB/s write MB/s read Crappy card 4GB 9 15 Samsung EVO+ 128GB 21 65 Ultra A1 32GB 20 66 Extreme A1 32GB 59 68 Samsung Pro 64GB 61 66 Extreme Plus 16GB 63 67 Extreme Pro 8GB 50 67 Well, sequential transfer speeds are close to irrelevant with single board computers or Android but it's good to know that boards that allow for higher SD card speed modes (e.g. almost all ODROIDs and the Tinkerboard) also show an improvement in random IO performance if the card is a good one. The ODROID N1 was limited to DDR50 (slowest SD card mode) until today when Hardkernel unlocked UHS capabilities so that my cards (except of 'Crappy card') could all use SDR104 mode. With DDR50 mode sequential performance is limited to 22.5/23.5MB/s (write/read) but more interestingly random IO performance also differs. See IOPS results with the two SanDisk A1 cards, one time limited to DDR50 and then with SDR104: 1K w/r 4K w/r 16K w/r Ultra A1 DDR50 449 2966 678 2191 445 985 Ultra A1 SDR104 456 3171 843 2791 548 1777 1K w/r 4K w/r 16K w/r Extreme A1 DDR50 740 3049 1039 2408 747 1068 Extreme A1 SDR104 833 3289 1507 3281 1126 2113 We can clearly see that the larger the blocksize the more the interface speed influences also random IO performance (look especially at 16K random reads that double with SDR104) Some conclusions: When comparing results above the somewhat older Samsung Pro performs pretty similar to the Extreme A1. But great random IO performance is only guaranteed with cards carrying the A1 logo (or A2 soon) so it might happen to you that buying another Samsung Pro today results in way lower random IO performance (see Hardkernel's results with a Samsung Pro Plus showing 224/3023 4k IOPS which is way below the 1124/3898 my old Pro achieves with especially write performance 5 times worse and below A1 criteria) We still need to focus on the correct performance metrics. Sequential performance is more or less irrelevant ('Class 10', 'UHS' and so on), all that matters is random IO (A1 and A2 soon). Please keep in mind that you can buy a nice looking UHS card from 'reputable' brands like Kingston, Verbatim, PNY and the like that might achieve theoretical 80MB/s or even 100MB/s sequential performance (you're not able to benefit from anyway since your board's SD card interface will be the bottleneck) but simply sucks with random IO performance. We're talking about up to 500 times worse performance when trusting in 'renowned' brands and ignoring performance reality (see 16k random writes comparing 'Crappy card' and 'Extreme A1') Only a few vendors on this planet run NAND flash memory fabs, only a few companies produce flash memory controllers and have the necessary know-how in house. And only a few combine their own NAND flash with their own controllers to their own retail products. That's the simple reason why at least I only buy SD cards from these 4 brands: Samsung, SanDisk, Toshiba, Transcend The A1 performance speed class is a great and necessary improvement since now we can rely on getting covenant random IO performance. This also helps in fighting counterfeit flash memory products since even if fraudsters in the meantime produce fake SD cards that look real and show same capacity usually these fakes suck at random IO performance. So after testing new cards with either F3 or H2testw it's now another iozone or CrystalDiskMark test to check for overall performance including random IO (!) and if performance sucks you simply return the cards asking for a refund. TL;DR: If you buy new SD cards choose those carrying an A1 or A2 logo. Buy only good brands (their names start with either S or T). Don't trust in getting genuine products but always expect counterfeit stuff. That's why you should only buy at sellers with a 'no questions asked' return/refund policy and why you have to immediately check your cards directly after purchase. If you also care about reliability/resilience buy more expensive cards (e.g. the twice as expensive Extreme Plus A1 instead of Ultra A1) and choose larger capacities than needed. Finally: All detailed SD card test results can be found here: https://pastebin.com/2wxPWcWr As a comparison performance numbers made with same ODROID N1, same settings but vendor's orange eMMC modules based on Samsung eMMC and varying only in size: https://pastebin.com/ePUCXyg6
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines