Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. Advertised as Android 7 capable so it must be running with Allwinner's 4.4 kernel which is bad news since not supporting voltage regulation any more (same with Libre Computer Tritium boards): https://github.com/Allwinner-Homlet/H3-BSP4.4-linux/issues/1 -- be prepared that maximum cpufreq is lower than any of the older H3 boards. Device tree settings from this Allwinner BSP kernel (and most probably u-boot too) are incompatible with mainline stuff so good luck with extracting settings and transforming them into something suitable for Linux (applies also to DRAM timings). How fast is the 16 GB eMMC? Which wireless chip? In the past with Allwinner H3 based TV boxes it was rather easy to get them supported in Armbian since it was reading out the proprietary fex stuff and then adjusting few bits. Could be done for the legacy Allwinner kernel even without having the device in front of you (see Sunvell R69) but the situation with boards based on the new BSP is entirely different. And I don't know who is still interested in Allwinner H3 here (at least me definitely not )
  2. Well, depends solely on the SoC used. I just did some benchmarks with 4.19 on one of my RK3399 devices yesterday: https://forum.armbian.com/topic/8161-swap-on-sbc/?do=findComment&comment=61637 Wrt features I'm not that much interested in (anything that requires adding a display to the boards) I wouldn't count on every feature available with mainline Linux (or Linux at all) regardless of SoC. But honestly I don't know enough about these areas and even managed to spread BS already.
  3. Ok, so another time the same observation. Maybe we should switch back to lzo then already. I feared my test always using the same task is somewhat flawed. At least it's configurable as SWAP_ALGORITHM in /etc/default/armbian-zram-config. But starting with the best default is for sure a good thing prior to next major release when this stuff gets rolled out. Nope. I simply used default kernel settings (and only tested with Rockchip 4.4, mainline 4.14 on NanoPi Fire3 and 4.19 on RK3399). How to configure the specific algorithm? Edit: another interesting observation: https://bugs.chromium.org/p/chromium/issues/detail?id=584437#c15 I really wonder whether the compression algorithms on ARM use NEON optimizations or not (the performance boost can be huge)
  4. I did just that: The thick thermal pad has been removed and replaced with a 15x15mm copper shim (0.8mm height). The grey stuff around the SoC are leftovers from another thermal pad (came with RockPro64) just to prevent shorting things with the heatsink. I tried to use sbc-bench in thermal testing mode. NanoPi Fire is in upright position (so convection might help a little bit) in a room with 24°C ambient temperature. sbc-bench.sh -T 80 ; sbc-bench.sh -t 60 Problem N° 1: with stock thermal pad I would have to wait endlessly for the SoC to cool down below 60°C (that what 'sbc-bench.sh -t 60' is supposed to do). The board once hot remained at 62°C to 63°C. I had to temporarely switch cpufreq governor to let the CPU cores clock down from 1400 to 400 MHz to get an idle temperature below 60°C again. Then numbers were as follows: 1400 MHz: 146.17 sec 1300 MHz: 148.42 sec 1200 MHz: 0 sec 1100 MHz: 0 sec 1000 MHz: 0 sec 900 MHz: 0 sec 800 MHz: 0 sec 700 MHz: 0 sec 600 MHz: 0 sec 500 MHz: 0 sec 400 MHz: 5.80 sec (the 6 seconds on 400 MHz are not result of throttling but me manually switching back to performance cpufreq governor once the real benchmark run started). Full output where you can see how long it took to cool down from 80°C to 60°C: https://pastebin.com/raw/1DDt8yGk Now with copper shim instead of thermal pad. Problem N° 2: now it's impossible for the 'pre-heat' run to get above 80°C. I would have to wait endlessly for this so I stopped and ran 'sbc-bench.sh -T 78 ; sbc-bench.sh -t 60' instead to only wait until temperature exceeds 78°C. Full output: https://pastebin.com/raw/a0vmQJ0b No throttling happened and temperature remained below 80°C. Again it's obvious that thermal pads simply suck when it's about efficient heat dissipation. But I've to admit that I've no idea whether the contact area now with my copper shim is sufficient or thermal pads are the recommended way since with my copper shim now only the very small die area and the copper shim have contact while the area around not. @mindee can you please shed some light? Edit: haha, today I realized that I ran with just 4 CPU cores active yesterday when I made my tests due to 'maxcpus=4' set in /boot/armbianEnv.txt (to test for swap efficiency of various approaches). Doesn't change anything wrt thermal conductivity but of course now with all 8 CPU cores active NanoPi Fire3 starts to throttle severely when running heavy stuff but at least with copper shim clockspeeds are much higher compared to thermal pad.
  5. Well, if Libre Computer is interested in getting Armbian on their boards at least some developers need samples. And wrt testing: It's a RK3399 thing so almost everything is already known. Only interesting: how fast is memory access (are different DRAM initialization BLOBs used or not?), how fast is the eMMC and how well works heat dissipation. So iozone and sbc-bench and done. To me Renegade Elite looks more like an ARM device for a couple of professional use cases (mainly dealing with cameras and imaging) running Android. But it seems enthousiasts are not scared away by costs and want to run Linux for whatever reasons... Almost forgot: Of course compatibility testing with various USB-C PSUs might be interesting as well. @gounthar which USB-C chargers do you own? 'Good' ones from this list?
  6. Little update: In the meantime I also tested with the really fast Samsung 16GB eMMC 5.1 on NanoPC-T4 (again crippled down to a quad-core A53 at 800 MHz). The board runs off the NVMe SSD, I mounted the eMMC as an ext4 partition, put there ARM's ComputeLibrary install and the swapfile on and fired up the test again. First post above is updated. With 4 GB RAM and no swap 100:39 minutes, with swapping on the fast NVMe SSD 118:47 and just 133:34 on the eMMC: w/o nvme lzo lz4 emmc usb2 usb3 hdd sd card real 100m39 118m47 125m26 127m46 133m34 146m49 154m51 481m19 1151m21 That's impressive. But this Samsung eMMC 5.1 on NanoPC-T4 (and also on ODROID-N1) is most probably the fastest eMMC we get on SBC today (see benchmark numbers). And still zram is faster and we get 'more RAM' for free (since swap on flash media contributes to the medium wearing out of course)
  7. All those tests I did before were done with Rockchip's 4.4 kernel. Since stuff in the kernel improves over time now let's test with brand new 4.19.0-rc1. I just did a quick build (with default device tree that limits maximum cpufreq to 1.8 GHz on the big and 1.4 GHz on the little cores) and only tested performance without swapping and zram based swap with the available algorithms (more recent kernels provide more compression algorithms to choose from): w/o lzo lz4 zstd lz4hc real 29m11 35m58 36m59 48m38 58m55 user 167m59 177m24 175m22 182m02 173m57 sys 5m32 21m10 22m59 69m35 123m46 Results: More recent kernel --> better results. Even with lower clockspeeds (1.8/1.4 GHz vs. 2.0/1.5 GHz) the test with kernel 4.19 runs 8% faster. So at same clockspeed this would result even in ~10% better performance Performance drop with zram/lzo compared to no swap with 4.4 was 31%. With 4.19 it's just 23%. So efficiency/performance of the zram implementation itself also improved a lot Again lzo is slightly faster than lz4, both zstd and lz4hc are no good candidates for this use case (but zstd is a great candidate for Armbian's new ramlog approach since it provides higher compression -- more on this later) In other words: with mainline kernel it makes even more sense to swap to a compressed block device in RAM since performance further increased. With this specific use case (large compile job) the performance drop when running out of memory and the kernel starting to swap to zram is below the 25% margin which is just awesome
  8. Well, the first thing I would try to improve is replacing the thermal pad with something with a lot better heat conductivity, see https://forum.armbian.com/topic/8125-quick-review-of-nanopi-k1-plus/?do=findComment&comment=61417
  9. It's all there, just use the search functionality this forum provides: https://forum.armbian.com/topic/1925-some-storage-benchmarks-on-sbcs/?do=findComment&comment=51350 Allwinner H6 (preliminary -- things might improve): https://forum.armbian.com/topic/7118-trying-to-compile-pine-h64/?do=findComment&comment=59603
  10. Honestly: if I have a RK3399 device and want 128 GB storage I choose something different. I ordered a cheap DRAM less NVMe SSD for 40€ (VAT/shipping included), inserted it into the M.2 slot, transferred the rootfs as easy as always with nand-sata-install and now my NanoPC-T4 runs directly off the NVMe SSD. I've no doubt that the eMMC performs nicely. But I doubt it's as fast as an NVMe SSD and what I like the most with SSDs: the good ones tell you when they start to wear out: root@nanopct4:/home/tk# smartctl -x /dev/nvme0n1 smartctl 6.6 2016-05-31 r4324 [aarch64-linux-4.4.153-rk3399] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: TS128GMTE110S Serial Number: AC2153AAE531202E0165 Firmware Version: R0515A0 PCI Vendor/Subsystem ID: 0x126f IEEE OUI Identifier: 0x000000 Controller ID: 1 Number of Namespaces: 1 Namespace 1 Size/Capacity: 128,035,676,160 [128 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Mon Sep 10 18:38:24 2018 UTC Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0006): Format Frmw_DL Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat *Other* Maximum Data Transfer Size: 64 Pages Warning Comp. Temp. Threshold: 70 Celsius Critical Comp. Temp. Threshold: 80 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 9.00W - - 0 0 0 0 0 0 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0x1) Critical Warning: 0x00 Temperature: 28 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 110,342 [56.4 GB] Data Units Written: 289,050 [147 GB] Host Read Commands: 11,897,227 Host Write Commands: 22,892,268 Controller Busy Time: 126 Power Cycles: 10 Power On Hours: 1 Unsafe Shutdowns: 4 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 The 'Percentage Used' line is the important one. As long as this is below 90 everything's fine. An eMMC will die suddenly without any prior warning...
  11. Not a good idea as Zador already explained: https://forum.armbian.com/topic/6104-network-manager-woes/?do=findComment&comment=46901 -- people not able to cope with NM should better entirely remove it. Interesting way to 'spend' your own time. One of the first things I always do at new customers is to stop configuring statically assigned addresses on each server machine to centrally configured static IP addresses via DHCP. If name resolution is set up correctly not even this would be necessary since names are consistent and addresses don't matter anyway...
  12. In fact I bought cheap since I got the NVMe SSD for less than 40 bucks on sale. My small TS128GMTE110S has soldered only 2 flash chips on it, if the maximum would be present (8) then it would be much much faster since all modern SSD controllers make heavy use of paralellisms (the more flash chips the faster). I did a quick iozone test with kernel 4.4 and results look not that great compared to an EVO 960 for example. But that's not relevant since the protocol makes the difference. NVMe has been invented in this century and not the last as all the other storage protocols we use today. And this makes a real difference since NVMe has been developed with accessing flash storage efficiently in mind. All the other protocols we might use (including SATA) were designed decades ago for way slower storage and all do bottleneck access to fast flash. With swap on the NVMe SSD the maximum %iowait percentage according to iostat monitoring was 0.37%. That's 70 times less compared to up to 25.07% with USB3!
  13. Next test: RK3399 with unlocked performance (all 6 CPU cores active at usual clockspeeds: 1.5/2.0GHz) w/o nvme lzo/2 lzo/6 lz4/2 lz4/6 real 31m55 40m32 41m56 41m38 43m57 44m26 user 184m16 194m58 200m37 202m20 195m17 197m51 sys 6m04 16m17 25m02 23m14 40m59 42m15 Full test output: For obvious reasons I did not test the crappy variants again (HDD, SD card, USB attached anything). So we're only looking at performance without swap, swap on NVMe SSD and zram. RK3399 when allowed to run at full speed finishes the same compile job in less than 32 minutes. Swap on NVMe SSD increases time by almost 30% now. I now also compared whether the count of zram devices makes a difference (still on RK's 4.4 kernel). Still lzo outperforms lz4 (which is irritating since everyone tells you lz4 would be an improvement over lzo) but there is no clear answer about count of zram devices (in fact the kernel uses 1-n streams to access each device so with modern kernels even a single zram device should suffice since kernel takes care of distributing the load accross all the CPU cores)
  14. This is some more research based on prior efforts. The goal is to make more efficient use of available RAM. If the system runs low on memory only two options are possible: either the kernel invokes the oom-killer to quit tasks to free memory (oom --> out of memory) or starting to swap. Swap is a problem if it happens on slow media. 'Slow' media usually describes the situation on SBC. 'Average' SD cards (not A1 rated) are slow as hell when it's about random IO performance. So swapping is usually something that should be avoided. But... technology improves over time. In Linux we're able to swap not only to physical storage but since a few years also to compressed memory. If you want to get the details simply do a web search for zram or check Wikipedia first.. Test setup is a NanoPC-T4 equipped with 4 GB RAM (RK3399 based so a big.LITTLE design with 2xA72 and 4xA53). I crippled the board down to being a quad-core A53 running at 800 MHz where I can easily switch between 4GB RAM and lower numbers: Adding 'mem=1110M maxcpus=4' to kernel cmdline results in the A72 cores being inactive, the kernel only using 1 GB DRAM and for whatever reasons cpufreq scaling not working so the RK3399 statically being clocked at 808 MHz. All tests done with RK's 4.4 (4.4.152). This test setup is meant as 'worst case possible'. A quad-core A53 at 800 MHz is more or less equivalent to a quad-core A7 running at ~1000-1100 MHz. So we're trying to test with the lower limit. I used a compile job that requires up to 2.6 GB RAM to be built (based on this blog post). The task is to build ARM's Compute Library which involves swapping on systems with less than 3 GB memory. Let's have a look: In the following I tried a couple of different scenarios: Swap on physical media and also two different zram algorithms: w/o: no swapping happened since board booted with full 4GB RAM active nvme: Transcend TS128GMTE110S SSD in M.2 slot, link is established at x4 Gen2 emmc: the 16GB ultra fast Samsung eMMC 5.1 on NanoPC-T4 usb2: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB2 port (UAS works) usb3: Samsung EVO840 SSD in JMS567 disk enclosure, attached to USB3 port (UAS works) hdd: Samsung HM500JI 2.5" HDD in JMS567 disk enclosure, attached to USB2 port (UAS works) sd card: 'average' SanDisk 8 GB SD card (not A1 rated so horribly low random IO performance) lzo: zram with lzo as compression algorithm lz4: zram with lz4 as compression algorithm And the numbers are: w/o nvme lzo lz4 emmc usb2 usb3 hdd sd card real 100m39 118m47 125m26 127m46 133m34 146m49 154m51 481m19 1151m21 user 389m48 415m38 405m39 402m52 415m38 415m29 407m18 346m28 342m49 sys 11m05 29m37 36m14 60m01 34m35 66m59 65m44 23m05 216m25 You need to look at the 1st row: that's the time the whole job took. For more details consult the 'time' manual page. In other words: When limiting the RK3399 on NanoPC-T4 to just the four A53 cores running at 800 MHz the compile job takes 100 minutes with 4 GB RAM. As soon as we limit the available RAM to 1 GB swapping has to occur so it gets interesting how efficient the various approaches are: NVMe SSD is the fastest option. Performance drop only 18%. That's due to NVMe being a modern storage protocol suited for modern (multi-core) CPUs. Problem: there's no PCIe and therefore no NVMe on the majority of SBC Zram with both lzo and lz4 algorithms performs more or less the same (interestingly lzo slightly faster) Slightly slower: the fast Samsung eMMC 5.1 Surprisingly the EVO840 SSD connected via USB2 performs better than connected via USB3 (some thoughts on this) Using a HDD for swap is BS (and was BS already the last 4 decades but we had no alternative until SSDs appeared). The compile job needs almost 5 times longer to complete since all HDD suck at random IO Using an average SD card for swap is just horrible. The job that finished within 100 minutes with 4 GB DRAM available took over 19 HOURS with swap on an average SD card (please note that today usual A1 rated SD cards are magnitudes faster and easily outperform HDDs) Summarizing: NVMe SSDs are no general option (since only available on some RK3399 boards). Swap on HDD or SD card is insane. Swap on USB connected SSDs performs ok-ish (~1.5 times slower) so the best option is to use compressed DRAM. We get a performance drop of just 25% at no additional cost. That's amazing. The above numbers were 'worst case'. That's why I crippled the RK3399 to a slow performing quad-core A53. You get the idea how 'worse' zram might be on the slowest SBCs Armbian runs on (I know that there are still the boring Allwinner A20 boards around -- yep, they're too slow for this). When I did all this boring test stuff I always recorded the environment using 'iostat 1800' (reports every 30 minutes what really happened and shows in detail how much data has been transferred and on which the CPU cores spent time). Quite interesting to compare %user, %sys and especially %iowait percentages:
  15. These 'open source' claims are BS almost everywhere. You need to pick one of the few Open Source Hardware protagonists, for example Olimex: https://www.olimex.com/Products/SOM204/ (RK3328 in the pipeline so if you're able to wait until 2019 maybe that's something for you) Apart from this I can not follow since RK3328, RK3329, RK3229 and RK3399 are different beasts (only two of them receiving good 'open source' support by Rockchip and RK3329 not even existing) Also worth to mention: Assumptions (like USB3 == great performance) should be questioned/checked.
  16. Oh, there's something we also need to take care of with upgrades: the existing 128MB swap file on SD card present in /etc/fstab: total used free shared buff/cache available Mem: 494M 66M 269M 2.7M 158M 415M Swap: 375M 0B 375M NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 50M 11.1M 2.9M 3.3M 4 /var/log /dev/zram1 lz4 61.8M 4K 63B 4K 4 [SWAP] /dev/zram2 lz4 61.8M 4K 63B 4K 4 [SWAP] /dev/zram3 lz4 61.8M 4K 63B 4K 4 [SWAP] /dev/zram4 lz4 61.8M 4K 63B 4K 4 [SWAP] With vm.swappiness=100 now it might be possible that the system decides to start to swap to SD card even if swap priority is pretty low. @Igor what do you think? Checking size of swap file and if it's 'our' 128MB file then removing fstab entry and swap file on upgrade?
  17. Prepared a potential method to workaround that: https://github.com/armbian/build/commit/098a391996a4cd56d6e493d94decc85a53254ff1 Needs a lot of testing of course...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines