Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. Ah, now I understand it. You could solder an usual 15-pin SATA power connector to the little board and then provide GND, 3.3V, 5V and 12V (optional) on the 4 pins.
  2. My personal opinion on this: Linux sucks. While zram is a nice way to make more use of the physically available DRAM the whole approach still sucks since we would need to take care also about our attempts to store browser profiles and caches in RAM (uncompressed -- what a waste). Since there's nothing like a globally acting memory compressor task in the background (as in macOS) it would need some more work to enhance psd/cache behavior (using compressed RAM of course). And then still the only reasonable way to run a fully blown desktop environment on those boards with low RAM is adding a fast UAS connected SSD, putting the rootfs on it and use properly configured zswap instead of zram. But why? Adding all the costs together a properly sized board with eMMC is the cheaper variant that sucks less.
  3. Wrong way since we implemented an own zram-control mechanism in the meantime already available in nightlies and supposed to be rolled out with next major update. For anyone coming across this: do NOT follow the above recipe, it's deprecated in Armbian and causes more harm than any good. @Igor: IMO we need to make the amount of zram configurable. Currently it's set as 'half of available RAM' in line 35 of the initialization routine. But since updates will overwrite this users who want to benefit from massive zram overcommitment (since it just works brilliant) are forced to edit this script over and over again. I propose to define the amount used as $ZRAM_PERCENTAGE that defaults to 50 and can be defined in a yet not created /etc/defaults/armbian-zram-config file. Any opinions?
  4. Known problem, currently this simply doesn't work.
  5. tkaiser

    NanoPI M4

    Yep. Check the link to the review and compare @hjc's tinymembench numbers from his 2GB M4 (4 x DDR3) with e.g. RockPro64 (2 x LPDDR4). It's both times dual-channel DRAM but it could be possible that we get more recent DRAM initialization BLOBs from Rockchip and then RockPro64 with LPDDR4 might be slightly faster (I don't think this will change anything with the larger 4GB M4 configuration using LPDDR3). BTW: For most use cases memory bandwidth is pretty much irrelevant.
  6. Yes, scaling_cur_freq is just some number compared to cpuinfo_cur_freq: https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt Querying the correct sysfs node is also more 'expensive' and therefore only allowed by root. Please see also @jeanrhum's adventure with the very same Atom and obviously similar kernel:
  7. [x] done (most probably not in exactly the way as expected )
  8. Since you claim you have to wait 30 seconds obviously these two services (collecting debug information and setting up efficient logging to RAM) are NOT the culprit, true? You better provide output from systemd-analyze critical-chain armbianmonitor -u
  9. I had the SSH session window still open and collected the relevant logging portions from 'iostat 1800' while running the test with USB3, USB2 and then again zram/lzo (which also surprisingly again outperformed lz4): USB3: %user %nice %system %iowait %steal %idle 82.31 0.00 12.56 4.68 0.00 0.45 74.77 0.00 16.80 8.25 0.00 0.18 55.24 0.00 19.84 24.44 0.00 0.48 72.22 0.00 16.94 10.43 0.00 0.41 50.96 0.00 22.24 26.09 0.00 0.71 USB2: %user %nice %system %iowait %steal %idle 81.77 0.00 11.95 5.30 0.00 0.99 75.99 0.00 16.95 6.71 0.00 0.35 66.50 0.00 19.19 13.81 0.00 0.49 77.64 0.00 18.31 3.97 0.00 0.08 44.17 0.00 12.99 13.09 0.00 29.74 zram/lzo: %user %nice %system %iowait %steal %idle 84.83 0.00 14.68 0.01 0.00 0.48 82.94 0.00 17.06 0.00 0.00 0.00 81.51 0.00 18.49 0.00 0.00 0.00 78.33 0.00 21.66 0.00 0.00 0.01 That's an interesting point and clearly something I forgot to check. But I was running with latest IRQ assignment settings (USB2 on CPU1 and USB3 on CPU2) so there shouldn't have been a problem with my crippled setup (hiding CPUs 4 and 5). But iostat output above reveals that %iowait with USB3 was much higher compared to USB2 so this is clearly something that needs more investigations.
  10. Hopefully fixed with https://github.com/armbian/build/commit/dc9ad0e1e5993cbe92bbbe25417a419fa1fc1123
  11. This was 'swap with SSD connected to USB3 port'. Now a final number. I was curious how long the whole build orgy will take if I use the same UAS attached EVO840 SSD and connect it to an USB2 port. Before and after (lsusb -t): /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=uas, 5000M /: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=uas, 480M The SSD is now connected via Hi-Speed but still UAS is usable. Here the (somewhat surprising) results: tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 145m37.703s user 410m38.084s sys 66m56.026s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 1014192 67468 758332 3312 188392 869388 Swap: 3071996 31864 3040132 That's almost 10 minutes faster compared to USB3 above. Another surprising result is the amount of data written to the SSD: this time only 49.5 GB: Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 905.22 3309.40 6821.28 5956960 12278368 sda 1819.48 4871.02 5809.35 8767832 10456832 sda 2505.42 6131.65 6467.18 11036972 11640928 sda 1896.49 5149.54 4429.97 9269216 7973988 sda 1854.91 3911.03 5293.68 7039848 9528616 And this time I also queried the SSD via SMART before and after about 'Total_LBAs_Written' (that's 512 bytes with Samsung SSDs): 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 16901233973 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 17004991437 Same 49.5 GB number so unfortunately my EVO840 doesn't expose amount of data written at the flash layer but just at the block device layer. Well, result is surprising (a storage relevant task performing faster with same SSD connected to USB2 compared to USB3) but most probably I did something wrong. No idea and no time any further. I checked my bash history but I repeated the test as I did all the time before and also iozone results look as expected: 39 cd ../ 40 rm -rf ComputeLibrary-18.03/ 41 tar xvf v18.03.tar.gz 42 lsusb -t 43 cd ComputeLibrary-18.03/ 44 grep -r lala * 45 time scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native EVO840 / USB3 random random kB reclen write rewrite read reread read write 102400 4 16524 20726 19170 19235 19309 20479 102400 16 53314 64717 65279 66016 64425 65024 102400 512 255997 275974 254497 255720 255696 274090 102400 1024 294096 303209 290610 292860 288668 299653 102400 16384 349175 352628 350241 353221 353234 350942 1024000 16384 355773 362711 354363 354632 354731 362887 EVO840 / USB2 random random kB reclen write rewrite read reread read write 102400 4 5570 7967 8156 7957 8156 7971 102400 16 19057 19137 21165 21108 20993 19130 102400 512 32625 32660 32586 32704 32696 32642 102400 1024 33121 33179 33506 33467 33573 33226 102400 16384 33925 33953 35436 35500 34695 33923 1024000 16384 34120 34193 34927 34935 34933 34169
  12. Now tests with the RK3399 crippled down to a quad-core A53 running at 800 MHz done. One time with 4 GB DRAM w/o swapping and the other time again with zram/lz4 and just 1 GB DRAM assigned to provoke swapping: Without swapping: tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 99m39.537s user 385m51.276s sys 11m2.063s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 3902736 102648 3124104 13336 675984 3696640 Swap: 6291440 0 6291440 Vs. zram/lz4: tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 130m3.264s user 403m18.539s sys 39m7.080s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 1014192 82940 858740 3416 72512 859468 Swap: 3042560 27948 3014612 This is a 30% performance drop. Still great given that I crippled the RK3399 to a quad-core A53 running at just 800 MHz. Funnily lzo again outperforms lz4: real 123m47.246s user 401m20.097s sys 35m14.423s As a comparison: swap with probably the fastest way possible on all common SBC (except those RK3399 boads that can interact with NVMe SSDs). Now I test with an USB3 connected EVO840 SSD (I created a swapfile on an ext4 FS on the SSD and deactivated zram based swap entirely): tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 155m7.422s user 403m34.509s sys 67m11.278s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 1014192 66336 810212 4244 137644 869692 Swap: 3071996 26728 3045268 tk@nanopct4:~/ComputeLibrary-18.03$ /sbin/swapon NAME TYPE SIZE USED PRIO /mnt/evo840/swapfile file 3G 26M -1 With ultra fast swap on SSD execution time further increases by 25 minutes so clearly zram is the winner. I also let 'iostat 1800' run in parallel to get a clue how much data has been transferred between board and SSD (at the blockdevice layer -- below at the flash layer amount of writes could have been significantly higher): Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 965.11 3386.99 7345.81 6096576 13222460 sda 1807.44 4788.42 5927.86 8619208 10670216 sda 2868.95 7041.86 7431.29 12675496 13376468 sda 1792.79 4770.62 4828.07 8587116 8690528 sda 2984.65 7850.61 9276.85 14131184 16698424 I stopped a bit too early but what these numbers tell is that this compile job swapping on SSD resulted in +60 GB writes and +48 GB reads to/from flash storage. Now imagine running this on a crappy SD card. Would take ages and maybe the card will die in between @Igor: IMO we can switch to new behaviour. We need to take care about two things when upgrading/replacing packages: apt purge zram-config grep -q vm.swappiness /etc/sysctl.conf case $? in 0) sed -i 's/vm\.swappiness.*/vm.swappiness=100/' /etc/sysctl.conf ;; *) echo vm.swappiness=100 >>/etc/sysctl.conf ;; esac
  13. http://forum.banana-pi.org/t/bpi-w2-sources-for-linux-etc/5780/ 'We’re working on it, and when it’s ready, it’s updated to github' 'we will update code and image soon' (just the usual blabla as always) @Nora Lee: Is this all unfortunate W2 customers can expect: No sources but only pre-compiled BLOBs? If you got u-boot and kernel sources from RealTek already why don't you share them as the GPL requires anyway? Do you understand that you're violating the GPL?
  14. It's also lower compared to the numbers I made with my first RK3399 device some time ago: ODROID-N1 (Hardkernel built RK's 4.4 just like ayufan without CONFIG_ARM_ROCKCHIP_DMC_DEVFREQ). But on NanoPi M4 there's always the internal VIA VL817 hub in between SuperSpeed devices and USB3 host controller and this usually affects performance as well. Update: EVO840 behind JMS567 attached to NanoPC-T4 with same 4.4 kernel and dmc governor set to performance (but CPU crippled down to a quad-core A53 clocked at 800 MHz): random random kB reclen write rewrite read reread read write 1024000 16384 396367 402479 372088 373177 373097 402228 373/400 MB/s read/write. So M4 numbers above need a second test anyway.
  15. Since I was not entirely sure whether 'test has been executed appropriately' I went a bit further to test no swap vs. zram on a RK3399 device directly. I had to move from RockPro64 to NanoPC-T4 since with ayufan OS image on RockPro64 I didn't manage to restrict available DRAM in extlinux.conf So I did my test with Armbian on a NanoPC-T4. One time I let the build job run with 4 GB DRAM available and no swapping, next time I limited available physical memory to 1 GB via extraargs="mem=1110M" in /boot/armbianEnv.txt and swapping happened with lz4 compression. We're talking about a 12% difference in performance: 4302 seconds without swapping vs. 4855 seconds with zram/lz4: tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 71m42.193s user 277m55.787s sys 8m7.028s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 3902736 105600 3132652 8456 664484 3698568 Swap: 6291440 0 6291440 And now with zram/lz4: tk@nanopct4:~/ComputeLibrary-18.03$ time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native ... real 80m55.042s user 293m12.371s sys 27m48.478s tk@nanopct4:~/ComputeLibrary-18.03$ free total used free shared buff/cache available Mem: 1014192 85372 850404 3684 78416 853944 Swap: 3042560 27608 3014952 Problem is: this test is not that representative for real-world workloads since I artificially limited the build job to CPUs 0-3 (little cores) and therefore all the memory compression stuff happened on the two free A72 cores. So next test: trying to disable the two big cores in RK3399 entirely. For whatever reasons setting extraargs="mem=1110M maxcpus=4" in /boot/armbianEnv.txt didn't work (obviously a problem with boot.cmd used for the board) so I ended up with: extraargs="mem=1110M" extraboardargs="maxcpus=4" After a reboot /proc/cpuinfo confirms that only little cores are available any more and we're running with just 1 GB DRAM. Only caveat: cpufreq scaling is also gone and now the little cores are clocked with ~806 MHz: root@nanopct4:~# /usr/local/src/mhz/mhz 3 100000 count=330570 us50=20515 us250=102670 diff=82155 cpu_MHz=804.747 count=330570 us50=20540 us250=102614 diff=82074 cpu_MHz=805.541 count=330570 us50=20542 us250=102645 diff=82103 cpu_MHz=805.257 So then this test will answer a different question: how much overhead adds zram based swapping on much slower boards. That's ok too To be continued...
  16. Just a quick note about DRAM latency effects. We noticed that with default kernel settings the memory controller when dmc code is active increases memory access latency a lot (details). When testing efficiency of zram swap compression more or less by accident I tested another use case that is highly affected by higher memory latency. When trying to build ARM's Compute Library on a NanoPC T4 limited to 1 GB DRAM (adding extraargs="mem=1110M" to /boot/armbianEnv.txt) my first run was with default settings (/sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to dmc_ondemand). Execution time with the build job relying heavily on zram based swap: 107m9.612s. Next try with /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/governor set to performance: 80m55.042s. Massive difference only due to some code trying to save some energy and therefore increasing memory latency (details). In Armbian we'll use an ugly hack to 'fix' this but this is something board makers who provide own OS images should also care about (@mindee for example)
  17. As a comparison now the same task (building ARM's Compute Library on a SBC) on a device where swapping does not occur. The purpose of this test was to check for efficiency of different swapping implementations on a device running low on memory (NanoPi Fire3 with 8 Cortex-A53 cores @ 1.4GHz but just 1 GB DRAM). Results back then when running on all 8 CPU cores (full details): zram lzo 46m57.427s zram lz4 47m18.022s SSD via USB2 144m40.780s SanDisk Ultra A1 16 GB 247m56.744s HDD via USB2 570m14.406s I used my RockPro64 with 4 GB DRAM and pinned execution of the compilation to the 4 Cortex-A53 cores running also at 1.4 GHz like the Fire3: time taskset -c 0-3 scons Werror=1 -j8 debug=0 neon=1 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a build=native This is a quick htop check (pinned to an A72 core) confirming that only the 4 A53 cores are busy: On NanoPi Fire3 when being limited to 4 CPU cores and with just 1 GB DRAM we got the following execution times (slightly faster with lzo in contrast to 'common knowledge' telling us lz4 would always be the better choice): Sun May 20 16:05:17 UTC 2018 100 4 lzo [lz4] deflate lz4hc real 86m55.073s Mon May 21 11:41:36 UTC 2018 100 4 [lzo] lz4 deflate lz4hc real 85m24.440s Now on RockPro64 without any swapping happened we get 73m27.934s. So given the test has been executed appropriately we're talking about a performance impact of below 20% when swapping to a compressed block device with a quad-core A53 @ 1.4 GHz (5125 seconds with lzo zram on NanoPi Fire3 vs. 4408 seconds without any swapping at all on RockPro64 --> 16% performance decrease). I looked at the free output and the maximum I observed was 2.6GB RAM used: root@rockpro64:/home/rock64# free total used free shared buff/cache available Mem: 3969104 2666692 730212 8468 572200 1264080 Swap: 0 0 0 'Used' DRAM over the whole benchmark execution was almost always well above 1 GB and often in the 2 GB region.
  18. Thank you for the detailed test and especially covering also the underpowering situation (a lot of users might run into). I don't know whether it was available from the beginning but now FriendlyELEC lists as options in their web shop: 5V 4A Power Adapter (+$8.99) German Plug Adapter( applies to: France,Germany,Portugal,Spain,Korea ) (+$5.99) The PSU as well as the heatsink seem like mandatory accessories to me.
  19. You already got a satisfying answer to your very same question: https://forum.armbian.com/topic/7746-is-your-sd-card-a-fake-check-with-sd-insight-android-app/?do=findComment&comment=60946 Kingston cards aren't good for the SBC use case anyway and your card is not even 'genuine' Kingston but a fake.
  20. Which doesn't change much wrt the name of Linux kernel modules that provide SMB3 client functionality Speaking about Linux naming conventions: 'mount_smb' is the deprecated variant of 'mount_cifs' that should be used today. And this most probably originated from the history of SMB/CIFS support in Linux. Two decades ago SMB could've been best described as a pile of rotten protocols that are pretty much useless since Microsoft's implementations differed in almost any detail. Compared to that CIFS was an advancement (read through Linux manual pages -- many of this historical stuff still there). SMB2 and SMB3 have nothing in common with the SMB we know from 2 decades ago. Robust protocols with lots of cool features and specifications worth the name. Anyway: CONFIG_CIFS is not set on just 5 kernel variants (by accident) so @sunxishan please send a PR with them enabled as module.
  21. 20x20x1mm. Ordered them 18 months ago on Aliexpress for 2 bucks (5 pieces) but the link is dead. Anyway: I don't think such a copper shim is a good solution for end users. Heatsink able to be directly attached to SoC is better. Will try again with my next RK3399 board with thermal glue between heatsink and copper shim and normal thermal paste between shim and SoC. Currently I fear a bit the shim could move when vibrations occur.
  22. And here's for the flash-image-1g-2cs-1200_750_boot_sd_and_usb.bin BLOB: http://ix.io/1lCe (if you look closely you see that between 23:01:08 and 23:06:24 some background activity happened -- one of my Macs backing up to the EspressoBin -- so that I had to repeat the OpenSSL test on an idle system later) With 'working' cpufreq my EspressoBin idles with 200 MHz at 5.8W (measured at the wall) and the SoC is hot like hell. Now without CONFIG_ARM_ARMADA_37XX_CPUFREQ and the 1200 MHz settings the board idles at 6.5W while running all the time at 1190 MHz, still being hot like hell. A difference of 0.7W is a joke given that these idle numbers are way too high anyway. There's only one SATA HDD connected (standby) and one LAN connection. RockPro64 with same 12V PSU measures below 3.7W. Update: with flash-image-1g-2cs-800_800_boot_sd_and_usb.bin the board idles at 5.9W.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines