Jump to content

tkaiser

Members
  • Posts

    5462
  • Joined

Everything posted by tkaiser

  1. Hi, in an attempt to add new information and correct conflicting information over at the bottom of https://linux-sunxi.org/SID_Register_Guide#Currently_known_SID.27s it would be great if as many owner of Allwinner boards could execute most recent sbc-bench version and post/share the result link here (ix.io/sprunge.us) https://github.com/ThomasKaiser/sbc-bench#execution The new sbc-bench version always lists the SID identifier next to the SoC so that it reads as follows for example: SoC guess: Allwinner H6 (SID: 82c00007) Especially boards with 64-bit SoCs like H618, H616, H6, H5, A64 but also R40/V40 and exotic variants like S3, V3s, V853 are of interest. Hopes are low though since the forum structure has changed a lot, posting directly in supported sunxi-forum seems impossible and most probably nobody will ever find this post
  2. Why? What are the benefits over the Radxa provided OS images?
  3. Today I had the chance to retest again. The good news: there's no real difference whether ASPM is set to default or performance. The bad news is that the idle consumption difference is quite higher. With dmc governor enabled (and therefore idling at 528 MHz DRAM clock) the difference between powersave and default/performance is almost 250 mW, at the highest DRAM clock it's around 150mW (back then when measuring the consumption difference there was no dmc governor and most probably the 100mW I was talking about were the result of rounding down). More details here: https://forum.radxa.com/t/rock-5b-debug-party-invitation/10483/508?u=tkaiser From an energy efficiency point of view a script line in armbian-hardware-optimization checking for PCIe devices being present at boot, then setting either powersave or performance would be perfect but as I recently learned Armbian isn't about such optimisations any more
  4. https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Quick_Preview_of_ROCK_5B.md#important-insights-and-suggested-optimisations BTW: The device is called Rock 5B as such congratulations on the subforum name...
  5. All of this (maybe except USB PD negotiations) applies to M3 as well: https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Quick_Preview_of_ROCK_5B.md#important-insights-and-suggested-optimisations
  6. https://forum.radxa.com/t/rock-5b-debug-party-invitation/10483/472?u=tkaiser Latest sbc-bench version 0.9.9 has also an additional service for ignorant people: reporting probably performance relevant governors...
  7. Well, I was talking about 3 different issues: switching with ASPM from powersave to default (you chose performance now, no idea what this does wrt consumption, switching from powersave to defailt is 100mW) io_is_busy (that's addressed now though in a redundant way but doesn't matter since only delaying the armbian-hardware-optimization service by a few ms). Both these changes should help with NVMe (and PCIe device) performance in general setting the dmc_governor to performance instead of dmc_ondemand (results in 600mW higher consumption though). This is reponsible for better overall performance since the dmc_ondemand governor often does not ramp up memory clock fast enough. A comparison between clocking LPDDR4 with only 528 MHz vs. the current maximum of 2112 MHz is here: https://browser.geekbench.com/v5/cpu/compare/17009700?baseline=17009078 The latter isn't addressed at all but I posted multiple times how this can be changed from userspace: echo performance >/sys/class/devfreq/dmc/governor @balbes150 did (only) two changes and the io_is_busy tweak will help with every storage device (USB and SATA too). No need for a new image since you can simply edit /usr/lib/armbian/armbian-hardware-optimization. Or overwrite it with the contests of http://ix.io/4aFe
  8. It's not about mainline kernel vs. BSP kernel but the latter having settings that might fit for Android (use cases like watching video, playing games) but not for Linux. You seem to be using these defaults without questioning them. Rockchip's BSP kernel defaults to powersave for ASPM (Active State Power Management) which of course negatively affects NVMe performance. As such you need to either eliminate CONFIG_PCIEASPM_POWERSAVE=y from kernel config or need to execute somewhere after booting: echo default >/sys/module/pcie_aspm/parameters/policy Also the BSP kernel when the dmc/dfi device-tree nodes are enabled (seems to be the case with your RK3588 kernel fork since the 7-zip scores you and @blondu are sharing are below 14800 while they could be around 16500) defaults to dmc_ondemand governor which can be changed by doing this: echo performance >/sys/class/devfreq/dmc/governor (similar way as I've done this for RK3399 years ago: https://github.com/armbian/build/blob/fdf73a025ba56124523baefaf705792b74170fb8/packages/bsp/common/usr/lib/armbian/armbian-hardware-optimization#L241-L244 ) And this here: prefix="/sys/devices/system/cpu" CPUFreqPolicies=($(ls -d ${prefix}/cpufreq/policy? | sed 's/freq\/policy//')) if [ ${#CPUFreqPolicies[@]} -eq 1 -a -d "${prefix}/cpufreq" ]; then # if there's just a single cpufreq policy ondemand sysfs entries differ CPUFreqPolicies=${prefix} fi for i in ${CPUFreqPolicies[@]}; do affected_cpu=$(tr -d -c '[:digit:]' <<< ${i}) echo ondemand >${prefix}/cpu${affected_cpu:-0}/cpufreq/scaling_governor echo 1 >${i}/cpufreq/ondemand/io_is_busy echo 25 >${i}/cpufreq/ondemand/up_threshold echo 10 >${i}/cpufreq/ondemand/sampling_down_factor echo 200000 >${i}/cpufreq/ondemand/sampling_rate done is the exact replacement for lines 81-89 in armbian-hardware-optimization: https://github.com/armbian/build/blob/fdf73a025ba56124523baefaf705792b74170fb8/packages/bsp/common/usr/lib/armbian/armbian-hardware-optimization#L81-L89 Without this I/O performance with Armbian sucks on a variety of boards for example ODROID N2/N2+, VIM3 or now the RK3588/RK3588S based boards. Unfortunately @lanefu seems to be way too biased or limited in his thinking to understand this when he creates bizarre tickets that are rotting around somewhere: https://armbian.atlassian.net/browse/AR-1262 I really don't care whether these fixes will be incorporated into Armbian. But if you benchmark stuff the settings should be adjusted accordingly. And we (we as a broader community – not this place here) already know how ASPM settings negatively affect performance of PCIe devices (like for example NVMe SSDs): https://forum.radxa.com/t/rock-5b-debug-party-invitation/10483/86?u=tkaiser, we know which role io_is_busy has and what the benefits and drawbacks of the chosen dmc governor are. A quality NVMe SSD even when just connected with a single Gen2 lane should always outperform any SATA SSD if it's about what really matters: random I/O. If the SSD is cheap garbage or the settings are garbage it might look differently.
  9. But your image or lets better say the kernel you're using is made for Android and lacks optimisations. As for NVMe what about echo default >/sys/module/pcie_aspm/parameters/policy or removing CONFIG_PCIEASPM_POWERSAVE=y from kernel config? And for I/O performance in general this would be needed since with my code fragments from half a decade ago that are still part of Armbian the 3rd CPU cluster isn't adjusted properly: prefix="/sys/devices/system/cpu" CPUFreqPolicies=($(ls -d ${prefix}/cpufreq/policy? | sed 's/freq\/policy//')) if [ ${#CPUFreqPolicies[@]} -eq 1 -a -d "${prefix}/cpufreq" ]; then # if there's just a single cpufreq policy ondemand sysfs entries differ CPUFreqPolicies=${prefix} fi for i in ${CPUFreqPolicies[@]}; do affected_cpu=$(tr -d -c '[:digit:]' <<< ${i}) echo ondemand >${prefix}/cpu${affected_cpu:-0}/cpufreq/scaling_governor echo 1 >${i}/cpufreq/ondemand/io_is_busy echo 25 >${i}/cpufreq/ondemand/up_threshold echo 10 >${i}/cpufreq/ondemand/sampling_down_factor echo 200000 >${i}/cpufreq/ondemand/sampling_rate done And based on sbc-bench results shared here the dmc/dfi nodes are enabled in device-tree (defaulting to dmc_ondemand) and as such this would restore 'full performance': echo performance >/sys/class/devfreq/dmc/governor
  10. How did you perform this comparison? Looking only at sequential transfer speeds? Or checking random I/O (which matters way more on an OS drive or when building software)? What does /sys/module/pcie_aspm/parameters/policy look like? powersave, right? And for a quick comparison of schedutil, performance, ondemand and ondemand with io_is_busy see the comments below https://github.com/radxa/kernel/commit/55f540ce97a3d19330abea8a0afc0052ab2644ef
  11. Really low numbers. 1) hdparm uses 128K block size to test which was a lot last century when whoever hardcoded this in hdparm but today it's just a joke. Use iozone or fio with larger block sizes 2) Armbian doesn't care since years about low-level optimizations. Better search for 'io_is_busy' and 'dmc/governor' to get full speed (both storage and CPU performance or at least a 2000 higher 7-zip score): https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Quick_Preview_of_ROCK_5B.md 3) check 'cat /sys/devices/system/cpu/cpufreq/policy?/ondemand/io_is_busy' – if I/O is processed by cpu6 or cpu7 it might be a lot slower compared even to the little cores 4) A PCIe Gen2 lane allows for 5 GT/s, SATA 6Gbps has 120% the data rate and both use 8b10b coding. As such SATA should be faster with sequential transfer speeds 5) Sequential transfer speeds are BS if it's about an OS drive since here random I/O matters. And due to SATA relying on AHCI made for spinning rust last century NVMe should outperform SATA even in a single Gen2 lane config and even if silly 'benchmarks' like hdparm draw a different picture. 6) What is 'cat /sys/module/pcie_aspm/parameters/policy' telling?
  12. Overview (Disclaimer: The following is for techies only that like to dig a bit deeper. And if you're not interested in energy-efficient servers then probably this is just a waste of time ) EDIT: Half a year after this poorly designed SBC has been released just one of the many design flaws has been fixed: Micro USB for DC-IN has been replaced by the barrel jack that was present on the pre-production batches. If you were unfortunate to get a Micro USB equipped M3 please have a look here how to fix this. Apart from that check the Banana forums what to expect regarding software/support first since this is your only source) SinoVoip sent me a review sample of the recently shipped so called "Banana Pi M3" yesterday. It's a SBC sharing name and form factor of older "Banana Pi" models but is of course completely incompatible to them due to a different SoC, an A83T (octa-core Cortex-A7 combined with a PowerVR SGX544 GPU). For detailed and up-to-date informations please always refer to the linux-sunxi wiki. This new model distinguishes itself from the Banana Pi M2 with twice as much CPU cores and DRAM (LPDDR3), 8 GB eMMC onboard and BT4.0. And compared to the "M1" (the original Banana Pi) it features also 802.11 b/g/n Wi-Fi. Unlike the M2 the M3 is advertised as being SATA capable. But that's not true, it's just an onboard GL830 USB-to-SATA bridge responsible for horribly slow disk access. Unfortunately the GL830 and both externally available USB ports are behind an internal USB hub therefore all ports have to share bandwidth this way and use just one single USB connection to the SoC. Since my use cases for ARM boards are rather limited you won't find a single word about GPIO stuff (should work if pin mappings are defined correctly), GPU performance, BT, Wi-Fi or Android. Simply because I don't care Getting Started: The board arrived without additional peripherals (no PSU) therefore you need an USB cable using a Micro-USB connector to power the board. Both DC-IN and USB-OTG feature an Micro-USB connector which is bad news since pre-production samples had a real DC-In connector (4.0mm/1.7mm barrel plug, centre positive like the M2). I suffered from several sudden shutdowns under slight load until I realized that I used a crappy cable. Many (most?) USB cables lead to voltage drops and when the board demands more power it gets in an undervoltage situation and the PMU shuts off. Same will happen to you unless you can verify that you've a good cable. I did not succeed querying the M3's powermanagement unit (PMU) regarding available voltage (/sys/devices/platform/axp81x_board/axp81x-supplyer.47/power_supply/ac/voltage_now shows always 0). This was a lot easier with the older Banana Pi M1: Here you can watch my cable being responsible for voltage drops under high load (I accidentally used this again with the M3). To avoid the crappy Micro-USB connector (limited to 1.8A maximum by specs and tiny contacts) you can desolder it and solder a cable or a barrel plug -- the PCB is already prepared for the latter. Or ask SinoVoip if they can fix this mistake with the next batch of PCBs. On the bottom side of the PCB there are also solder pads for a Li-Ion battery. It has to be confirmed whether the AXP813 PMU can also be fed with 5V through the Li-Ion connector since this is the preferred way to fix the faulty power design other SinoVoip products show. One final word regarding power: It seems currently something's wrong with power initialisation in the early boot stages (u-boot). With a connected bus-powered USB disk the board won't start or immediately shut down when the disk is connected within the first 10 seconds. I didn't verify when exactly because if you've a look at SinoVoip's commit log it seems they began to fix many obvious bugs just right now after they already started shipping the board (we've seen that with the M2 also). First Showstoppers: Since the board came with an unpopulated eMMC (why the heck?) I had to try out the available OS images from the banana-pi.org download site. Unlike everyone else on this planet they don't provide MD5/SHA1 checksums to be able to check integrity of downloads and even if you tell them that they've uploaded corrupted images they don't care. From 4 OS images 3 are corrupted (according to unzip) and all failed soon after boot with kernel panics. I tried the Android image to verify FEL mode works. But since Android is of zero use for me, I decided to build an own OS image from an Ubuntu distro running on the Orange Pi where I had the SD-card inserted. Since details are boring just as a reference. From then on I used this Ubuntu image and exchanged only the freshly built stuff from SinoVoip's BSP Github repo (3.4.39 kernel, modules, bootloader and also simple things like hardware initialisation since kernel/u-boot they prefer does NOT support script.bin) First Impressions: Heat (dissipation): The A83T needs a heatsink otherwise you won't be able to benefit from its performance. Allwinner's 3.4.39 kernel provides 'budget cooling' using 2 techniques: thermal throttling and shutting down CPU cores. You can define this 'thermal configuration' in sysconfig.fex and have to take care that you understand what you're doing since if throttling doesn't help your CPU cores will be deactivated and you have to can't bring them back online manually the usual way since Allwinner's kernel doesn't allow so: echo 1 >/sys/devices/system/cpu/cpuX/online Therefore it's better to stay with the thermal defaults to allow throttling and improve heat dissipation instead. I used a $0.5 heatsink that performs ok. Without heatsink when running CPU intensive jobs throttling limited clockspeed to 1.2 GHz but with the heatsink I was able to run most of the times at ~1.6Ghz under full load. With heatsink and an annoying fan I managed to let the SoC run constantly at 1.8GHz and achieved a 7-zip score close to 6000 and finished "sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8" in less than 53 seconds. This is an example for wrong throttling values (too high) so that the kernel driver does not limit clockspeeds but starts to drop CPU cores instead: CPU performance: Since the H3 (used on the more recent Orange Pis) and the A83T seem to use much of the same kernel sources (especially the 'thermal stuff') I did a few short tests. When running with identical clockspeed and the same amount of cores they perform identical (that means they're slower than older Cortex-A7 SoCs like eg. the A20 when running at identical clockspeed -- a bit strange). Obviously the difference between H3 and A83T is the process. Both already made in 28nm but the A83T as 'tablet SoC' in the more energy efficient HPC process allowing less voltage and higher clockspeeds. According to sysconfig.fex the SoC should be able to clock above 2.1 GHz but since exceeding 1.6 Ghz already needs a fan this is pretty useless on a SBC (might be different inside a tablet where the back cover could be used as a large heatsink). Network throughput: I used my usual set of iperf testings and tried GBit Ethernet performance (with and without network tunables it remains the same -- reason below): BPi-M3 --> Client: [ 4] 0.0-10.0 sec 671 MBytes 563 Mbits/sec [ 4] 0.0-10.0 sec 673 MBytes 564 Mbits/sec [ 4] 0.0-10.0 sec 870 MBytes 729 Mbits/sec [ 4] 0.0-10.0 sec 672 MBytes 564 Mbits/sec [ 4] 0.0-10.0 sec 675 MBytes 566 Mbits/sec Client --> BPi-M3: [ 4] 0.0-10.0 sec 714 MBytes 599 Mbits/sec [ 5] 0.0-10.0 sec 876 MBytes 734 Mbits/sec [ 4] 0.0-10.0 sec 598 MBytes 501 Mbits/sec [ 5] 0.0-10.0 sec 690 MBytes 578 Mbits/sec [ 4] 0.0-10.0 sec 604 MBytes 506 Mbits/sec When I used longer test periods (-t 120) then the "Client --> BPi-M3" performance increased up to the theoretical limit: 940 Mbits/s. Then a second iperf thread jumped in, both utilising a single CPU core fully. And that's the problem: Networking is CPU bound, a single client-server connection will not exceed 500-600 Mbits/sec as it was the case when I started with A20 based boards 2 years ago. Since all we have now with the A83T is an outdated 3.4.39 kernel and since I/O bandwidth on the M3 is so low, I stopped here since it's way too boring to try to improve network throughput and also useless (disk access is so slow that it simply doesn't matter when Ethernet is limited to half of the theoretical GBit Ethernet speed... at least for me ) Accessing disks: Since there's a SATA connector on the board I gave it a try. Important: the SATA-power connector uses the same polarity as older Banana Pis and Orange Pis (keep that in mind since combined SATA data/power cables from LinkSprite and Cubietech that share exactly the same connector use inverted polarity!). I started with the Samsung EVO I always use for tests (but due to the old 3.4 kernel using ext4 instead of btrfs) and was shocked: 13.5/23 MB/s is the worst result I ever measured. I then realised that I limited maximum cpufreq to 480 MHz and tried with 1800 MHz again. A bit better but far away from acceptable: GL830 USB-to-SATA performance: 480 MHz: kB reclen write rewrite read reread 4096000 4 13529 13466 22393 22516 4096000 1024 13588 13411 22717 26115 1.8 GHz: 4096000 4 15090 15082 30968 30316 4096000 1024 15174 15131 30858 29441 I disconnected the SSD from the 'SATA port' and put it in an enclosure with a JMicron JMS567 USB-to-SATA bridge and measured again: Now sequential transfer speeds @ 1800 MHz exceeded 35/34 MB/s. The GL830 is responsible for low throughput -- especially writes are slow as hell. I made then a RAID-1 through mdadm consisting of an external 3TB HDD (good news: the GL830 can deal with partitions larger than 2 TB) and the SSD. First test with the HDD connected to the M3's GL830 bridge (GL) and the SSD connected to the JMS567 (JM). Then I disconnected the HDD from the GL830 and put it in another external enclosure with an ASMedia 1053 (ASM). Obviously SinoVoip's decision to use an internal USB hub and only one host port of the SoC leads in both situations to limited (shared) bandwidth. But in case the internal USB-to-SATA bridge is involved performance is even worse: GL/JM: kB reclen write rewrite read reread 4096000 4 17800 17140 14382 16807 4096000 1024 17741 17258 14493 14368 JM/ASM: 4096000 4 19307 18458 22855 26241 4096000 1024 19231 18518 21995 22362 If SinoVoip would've saved the GL830 USB-to-SATA bridge and wired both SoC's host ports to the 2 type-A USB ports directly without the internal hub in between overall performance would be twice as good. And obviously the M3's 'SATA port' is the worst choice to connect a disk to. Any dirt-cheap external USB enclosure will perform better. SD-card and eMMC: Just a quick check with the usual iozone settings running @ 1.8 GHz: kB reclen write rewrite read reread eMMC: 4096000 4 26572 27014 59187 59239 4096000 1024 25875 26614 56587 56667 SD-card: 4096000 4 20483 20855 22473 22892 4096000 1024 20526 19948 22285 22660 LOL, eMMC twice as fast as 'SATA'. The performance numbers of the SD-card (SanDisk "Extreme Pro") are irrelevant since I can not provide performance numbers from a known fast reference implementation. But since I might be able to provide this the next few days, I decided to give it a try. On older Allwinner SoCs there's a hard limitation regarding SDIO/SD-card speed. Maybe this applies here too. EDIT: Yes, it's a board/SoC limitation. When reading/writing the SD-card on a MacBook Pro I achieve ~80 MB/s. It seems SDIO on A83T is limited to ~20MB/s Other issues: If you want to try out the M3 you'll have to stay on the bleeding edge. Don't expect that any of the available OS images are close to useable. They just recently started to fix a lot of essential bugs in code and hardware initialisation. If you want to test the M3 be prepared to compile the BSP daily and exchange the bootloader/kernel/initialisation stuff on your SD-card/eMMC Currently average load is always 1 or above. When we started over 2 years ago with Cubieboards (and an outdated kernel 3.4.x) there was a similar issue. Maybe it's related. I just opened a Github issue Mainline kernel support in very early stage. Don't count on this that soon (situation with Banana Pi M2 was a bit different. All the OS images from SinoVoip based on kernel 3.3 weren't useable but the community provided working distros backed by the work of the linux-sunxi community and existing mainline kernel support for the M2's A31s) Always keep in mind that hardware without appropriate software is somewhat useless. SinoVoip has a long history of providing essential parts of software way too late or not at all (still applies to the M2 -- before you buy any SinoVoip product better have a look into their forums to get the idea which level of support you can expect: zero). Even worse: For the M2 and its A31s SoC there exists mainline kernel support (everything developed by the community while the vendor held back necessary informations). This does not apply to the A83T used on the M3. At the moment you're somewhat lost since you've to rely on the manufacturer's OS images (all of them currently being broken) Conclusion: Still no idea what to do with such a device. Integer performance is great when you use a heatsink and even greater with an annoying fan. But where's the use case? If I would use the M3 with Android then everything that's relevant for performance does not depend on CPU (but instead CedarX for HW accelerated video decoding and GPU for 2D/3D acceleration -- BTW: the A83T is said to contain only a single core SGX544MP1 but the fex file's contents let me believe it's a faster MP2 instead). Due to limited I/O and network bandwidth the integer performance is also irrelevant for nearly all kinds of server tasks. If it's just about 'SBC stuff' why wasting so much money? Triggering GPIO pins works also with cheap H3 based boards like Orange Pi PC or Orange Pi One that also have 4 times more I/O bandwidth compared to the M3 (due to 4 available USB ports instead of one). And if I would really need a performant ARM SoC then I would buy such a thing and not an outdated Cortex-A7 design. I still have no idea what the M3 is made for. Except of selling something under the "Banana" brand to clueless people. Don't know. For my use cases the Banana Pi M1 outperforms the M3 easily -- both regarding price and performance (sufficient CPU power, 3 x USB and real SATA not 'worst USB-to-SATA implementation ever'). As usual: YMMV Maybe the worst design decision (next to choosing the crappy Micro-USB connector for DC-IN) on the M3 is the 'SATA port'. If they would've saved both internal USB hub and GL830 and instead use the two available USB host ports then achievable I/O bandwidth would be way higher. Now both USB ports and the 'SATA port' have to share the bandwidth of a single USB 2.0 connection. Almost as bad as with the Raspberry Pis. But most importantly: Check software und support situation first and don't rely on 'hardware features'. Remember: SinoVoip shipped the M2 with OS images where not a single GPIO pin was defined and Ethernet worked only with 100Mb/s since they 'forgot' to define GMAC pins. They fixed that months later but still not for every OS image (the Android image they provide is corrupted since months but they don't care even if users complain several times). Visit their forums first to get an idea what to expect. It's important! Armbian support: Not to be expected soon. It's worthless when having to rely on Allwinner's old 3.4.39 kernel. I combined loboris' H3 Debian image with kernel/bootloader stuff for the A83T and it worked as expected (even my RPi-Monitor setup matched almost perfectly). Unless the linux-sunxi community improves mainline support for the A83T this situation won't change. But maybe someone interested in M3 (definitely not me) teaches SinoVoip how to escape from u-boot/kernel without support for script.bin in the meantime. Would be a first step.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines