Jump to content


  • Posts

  • Joined

Posts posted by tkaiser

  1. What happend? I unplugged the akku.

    A simple reboot doesnt seems to help.


    Interesting. A cold/warm/hot boot issue (there seem to be a couple of related issues, eg. memory calibration: http://linux-sunxi.org/A10_DRAM_Controller_Calibration)


    Maybe the whole problem is related to DRAM (initialisation). IIRC Igor already included the a10-meminfo tool so it would be worth a look to compare dqs gating delay settings in both cases (please compare with http://irclog.whitequark.org/linux-sunxi/2015-04-27)

  2. The kernel version and settings might be the same. But since hardware is different some low level drivers will be different that might also affect this and that. Same applies to board initialisation from within u-boot. But you can switch between different boards by simply exchanging SPL+u-boot+kernel and use an otherwise unmodified image with all the boards.


    I would think about what you can expect if you're after the cheapest device possible (it's neither stability nor realiability  :P )


    I like the Olimex boards (due to being real OSHW) but still experience worse network and overall performance with my Lime2 compared to the original Banana Pi (no need for the Pro -- if I would need onboard Wi-Fi I would give the BPi M1+ a try -- nearly same connector layout and identical hardware but the chips that might overheat at the upper side of the PCB: http://kaiser-edv.de/tmp/Nyiuha/)


    There's another A20 board that's worth to mention: pcDuino3 Nano. Feature wise comparable to Banana Pi and regarding NAS useage faster than my Lime2: http://forum.lemaker.org/forum.php?mod=redirect&goto=findpost&ptid=12167&pid=66487&fromuid=33332

  3. Thats means, that nearly all Lime2 have packet loss up to 25% ?!


    I don't know how to interpret that correctly since your ping test is some sort of flooding, isn't it?


    Regarding performance: As already mentioned yesterday evening I also let my Banana Pi run through the tests. The best settings for TX/RX delay were 3/0 (defaults):

    TX 3, RX 0:
    TX: 700 Mbits/sec, 5000 packets transmitted, 3593 received, 28% packet loss, time 27440ms
    rtt min/avg/max/mdev = 0.364/0.534/0.834/0.051 ms
    RX: 800 Mbits/sec, 5000 packets transmitted, 5000 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.363/0.568/0.753/0.045 ms
    TX 3, RX 2:
    TX: 650 Mbits/sec, 5000 packets transmitted, 3583 received, 28% packet loss, time 27449ms
    rtt min/avg/max/mdev = 0.369/0.520/0.972/0.062 ms
    RX: 700 Mbits/sec, 5000 packets transmitted, 4999 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.362/0.542/0.687/0.067 ms
    TX 1, RX 0:
    TX: 680 Mbits/sec, 5000 packets transmitted, 3596 received, 28% packet loss, time 27386ms
    rtt min/avg/max/mdev = 0.374/0.530/0.892/0.058 ms
    RX: 830 Mbits/sec, 5000 packets transmitted, 5000 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.362/0.557/1.713/0.057 ms
    TX 1, RX 4:
    TX: 650 Mbits/sec, 5000 packets transmitted, 0 received, 100% packet loss, time 50824ms
    RX: 30.8 Kbits/sec, 5000 packets transmitted, 0 packets received, 100.0% packet loss
    TX 2, RX 2:
    TX: 640 Mbits/sec, 5000 packets transmitted, 3592 received, 28% packet loss, time 27469ms
    rtt min/avg/max/mdev = 0.366/0.536/0.878/0.059 ms
    RX: 800 Mbits/sec, 5000 packets transmitted, 4999 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.364/0.569/0.710/0.037 ms

    The RX ping reports are from OS X 10.9.5 on a MacBook Pro. In TX direction more lost packets but also more throughput compared to Lime2.

  4. I did also the tests in between. There are just 2 combinations that work somewhat reliable with my Lime2:


    0/0: 600 Mbits/sec TX, 370 Mbits/sec RX, packet losses: 41% 27.0% (TX/RX)

    3/0: 580 Mbits/sec TX, 355 Mbits/sec RX, packet losses: 39% 25.5% (TX/RX)


    The results vary a lot which seems to be a case for some sort of mismatch. Right now I'm testing a Banana Pi (the 'old' model, not the Pro/M1+) and with its default delay setting (3/0 and performance governor with 960 MHz and DRAM clocked with 480 MHz) it looks like:


    TX: 700 Mbits/sec, 5000 packets transmitted, 3593 received, 28% packet loss, time 27440ms

    rtt min/avg/max/mdev = 0.364/0.534/0.834/0.051 ms
    RX: 800 Mbits/sec, 5000 packets transmitted, 5000 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.363/0.568/0.753/0.045 ms
  5. r-0-0.txt:5000 packets transmitted, 3479 received, 30% packet loss, time 66099ms


    r-2-0.txt:5000 packets transmitted, 3540 received, 29% packet loss, time 193752ms


    r-4-0.txt:5000 packets transmitted, 3601 received, 27% packet loss, time 117639ms


    All other combination didnt work at all (no network)


    So 0-0 seems to fit, unfortunatelly.


    Interesting. But TX4/RX0 seems to be even better?


    I will test with my Lime2 maybe this evening or tomorrow. I just let another automated test with the Lamobo R1 run (using 3 times a 10 sec iperf run and using the performance CPU governor): http://pastebin.com/VzWxpepX


    The results seem to be on Lamobo R1:

    • TX delays 0, 1 and 2 don't work at all.
    • TX throughput maxes out at 370 Mbits/sec (100% CPU utilisation @ 960 MHz)
    • RX throughput maxes out at 460 Mbits/sec (100% CPU utilisation @ 960 MHz)
    • Manipulating RX delay does matter regarding TX throughput
    • clock speed as well as cpufreq settings (especially governor + scaling_min_freq) directly influence performance/benchmarks

    Really looking forward to test with Lime2 or Banana Pi. Maybe I'll increase possible clock speeds from 960 MHz to 1008 or even 1200 MHz in arch/arm/boot/dts/sun7i-a20.dtsi when results of the combination GMAC+RTL8211 look promising to get closer to the limits.

  6. Testing done with Lamobo R1 since it's useless. I tried a few different combinations of RX delays (all with TX_DELAY 4):

    RX 0: 458 Mbits/sec, 25% packet loss
    RX 1: 458 Mbits/sec, 25% packet loss
    RX 2: 456 Mbits/sec, 25% packet loss
    RX 4: 457 Mbits/sec, 25% packet loss
    RX 6: 455 Mbits/sec, 25% packet loss

    The results are identical and the obvious reason is that iperf is CPU bound and one core always spent 100% utilisation on the iperf server thread in question. So while different RX delay settings still might make a difference they won't show any practical difference due to the CPU being the bottleneck (or the stuff I patched does not work :) ).


    Time to test again with a Banana Pi or Lime2 which use a somewhat different driver framework.


    I always expierenced 25% packet loss when pinging the directly connected MacBook using

    ping -s 9000 -i 0.0001 -c 5000 macbook.local

    BTW: This is the test script I used. Set up prerequisits like in the comment outlined and then call it from /etc/rc.local. Will exchange u-boot 64 times, reboots afterwards and tests with the newly applied settings:

    root@lamobo:~# cat /usr/local/bin/gmac-delay-test.sh 
    # gmac-delay-test.sh
    # to revert:
    # rm /root/stop ; echo -n 0 >/root/rx ; echo -n 0 >/root/tx ; dpkg -i /root/uboot/linux-u-boot-lamobo-r1_3.1_0_0_armhf.deb ; shutdown -r now
    if [ -f /root/stop ]; then
    	exit 0
    Main() {
    	read rx </root/rx
    	read tx </root/tx
    	echo "testing tx: ${tx}, rx: ${rx}"
    	if [ $rx -eq 8 -a $tx -eq 0 ]; then
    		touch /root/stop
    		exit 0
    		if [ ${tx} -eq 7 ]; then
    			rx=$(( ${rx} + 1 ))
    			tx=$(( ${tx} + 1 ))
    		echo -n ${tx} >/root/tx
    		echo -n ${rx} >/root/rx
    		dpkg -i /root/uboot/linux-u-boot-lamobo-r1_3.1_${tx}_${rx}_armhf.deb
    		shutdown -r now
    } # Main
    DoTest() {
    	ping -c 1 -W 1 MacBook.local >/dev/null 2>/dev/null
    	case $? in
    			MbitTX=$(iperf -c -t 30 | awk -F" " '/Mbits/ {print $7}')
    			echo -e "$(date)\t$tx\t$rx\t${MbitTX}" >>"${Testlog}"
    			echo -e "$(date)\t$tx\t$rx\t-" >>"${Testlog}"
    } # DoTest
  7. Next observation: Without adjusting the cpufreq stuff measuring anything related to performance is just crap. The default settings with 4.x scale the CPU frequency between 144 and 960 MHz and ondemand governor. So if you start a test when the CPU has been idle for a while you will get completely different results since the CPU will stay a few seconds in its lowest speed state before clocking up:

    root@lamobo:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state 
    144000 22814
    312000 1807
    528000 612
    720000 784
    864000 302
    912000 76
    960000 21377

    Without something like 

    echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

    applied prior to testing you can expect random results. So I put that now into /etc/rc.local for the tests (since they showed exactly what's to be expected: First iperf run after board being idle: 100 MBits/sec less compared to the consequent tests when the CPU has been clocked with 960 MHz).

  8. Ok, I tried all 64 combinations using the 64 created u-boot debian packages with different TX/RX delay settings. Results of a single iperf TX run to a directly connected MacBook Pro here: http://pastebin.com/xqJN5Kpp(Igor's default settings with eth0 IRQs dedicated to cpu1 -- no further tuning applied).


    What seems to be obvious: When TX delay is set to 0, 1 or 2 then no network connection can be established at all. The results measured might depend on other stuff like load and do not show any real difference regarding different TX or RX delay settings (one probable exception: RX delay set to 5). When running the tests iperf utilized one single CPU core to 100% so there might be the chance that different TX/RX delay settings make a real difference but you won't measure this due to driver problems:

    root@lamobo:~# time iperf -c -t 30                                                                      
    Client connecting to, TCP port 5001
    TCP window size: 43.8 KByte (default)
    [  3] local port 45384 connected with port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-30.0 sec  1.18 GBytes   336 Mbits/sec
    real    0m30.067s
    user    0m0.150s
    sys     0m28.850s

    (nearly all time spent in sys). The overall CPU utilisation when running the test was around 130%-150% so there's not that much room for improvement. Will try next with an Olimex Lime2 and do in the meantime some iperf tests in the other direction with TX delay 4 and different RX delays.


    BTW: clock speeds matter. I used the default settings (maximum operating point in 4.0.4 960 MHz per default) and 'verified' using 

    sysbench --test=cpu --cpu-max-prime=5000 run --num-threads=2

    Since execution time was around 54.5 secs the kernel might clocked up to 960 MHz after a short period of time (on Kernel 3.4 with 912 MHz and CPU governor performance the very same test finishes in 55-56 secs)

  9. Nearly all the stuff around the horrible power situation of the Lamobo R1 can be found in this otherwise pretty useless and crappy forum: http://bananapi.com/index.php/forum/general/391-why-the-sata-disk-doesnt-work-on-bpi-r1?start=12


    I would have a look for undervoltage issues (very likely).


    @Patcher: If you power the board using the LiPo socket does a connected HDD/SSD still work? And how do you solved the mechanical challenge to insert a plug into the LiPo connector and also use a disk (bending the connector?).


    JFR: I used the board with an older image (3.4.106 or even older) and both a connected 2.5" HDD and a HDMI display. Since the AXP209 also has to power the disk on the Lamobo R1 you can simply read out the power requirements using sysfs. And due to the crappy Micro USB connector it's not possible to boot the board when a power hungry USB keyboard and mouse were also connected (peak consumption when trying to spin up the disk exceeded the overall power maximum). Without the USB peripherals it worked even with unpatched u-boot (rootfs on SATA). And when using the stress utility to produce some load the consumption of the board sometimes reached 9V (maximum since Micro USB allows 5V/1.8A max.)


    And never ever use the original acrylic enclosure especially lying flat around. Both disk and the AXP209 power management unit might overheat easily due to bad placement and no airflow possible.

  10. Sorry, no time to test (been busy in the kitchen). But if you want to give different TX/RX delay parameters a try you could play with the 64 different u-boot packages my script created: http://kaiser-edv.de/tmp/lime2-u-boot.tgz


    Contains 64 debs with the following name scheme: linux-u-boot-lime2_1.9_$TX_$RX_armhf.deb. To use an unmodified RX delay and eg. 4 as TX delay simply install

    dpkg -i linux-u-boot-lime2_1.9_4_0_armhf.deb

    (I completely rely on Igor's compile_uboot function and no testing has been done regarding RX delay! Be warned: you might end up with a corrupted bootloader!). JFR: Another u-boot patch was necessary:

    diff --git a/board/sunxi/Kconfig b/board/sunxi/Kconfig
    index 2fcab60..4623de6 100644
    --- a/board/sunxi/Kconfig
    +++ b/board/sunxi/Kconfig
    @@ -451,4 +451,10 @@ config GMAC_TX_DELAY
            Set the GMAC Transmit Clock Delay Chain value.
    +config GMAC_RX_DELAY
    +        int "GMAC Receive Clock Delay Chain"
    +        default 0
    +        ---help---
    +        Set the GMAC Reveice Clock Delay Chain value.

    Both u-boot patches combined for GMAC RX DELAY: http://pastebin.com/adiWjzya

    Regarding the problems you experience: Do the Lime2 showing problems have a different board revision? I can not remember having packet losses with my Lime2. Unfortunately I cannot test immediately since I have to prefer the Lamobo-R1 that is dedicated to a customer and shows really bad performance right now. Maybe on thursday I'll have a look.
  11. Well, I doubt that TX DELAY is related to the problems you experience (these delay settings should only matter when you compare different boards or board revisions).


    But anyway. Now I have a patchset to create 64 u-boot variants with all possible TX/RX delay variations and will give them later a try (with Lamobo-R1 first, will try the very same stuff with my Lime2 if I suceed with testing and this brute-force approach shows good results). I just added two more lines to gmac.c and hope that they will work:

    diff --git a/board/sunxi/gmac.c b/board/sunxi/gmac.c
    index 8849132..1bce3ce 100644
    --- a/board/sunxi/gmac.c
    +++ b/board/sunxi/gmac.c
    @@ -26,6 +26,8 @@ int sunxi_gmac_initialize(bd_t *bis)
    +        setbits_le32(&ccm->gmac_clk_cfg,
    +                     CCM_GMAC_CTRL_RX_CLK_DELAY(CONFIG_GMAC_RX_DELAY));
            setbits_le32(&ccm->gmac_clk_cfg, CCM_GMAC_CTRL_TX_CLK_SRC_MII |

    The diff for modifications of Igor's scripts are here: http://pastebin.com/ZMF89Y57 and you still would need to define 'GMAC_DELAY_TEST="yes"' in compile.sh

  12. BTW: Since I want to play around with both TX and RX delay parameters I started to modify Igor's build scripts for this purpose. Since the build system creates a .deb package for u-boot that overwrites the SPL/u-boot on SD card my idea was to let create all 64 variants of possible TX/RX delay values and then let a script automatically install the different u-boot variants, reboot afterwards, test network performance using ping/iperf, tries the next u-boot.deb until all combinations are tested.

    Currently I got the first step finished: Automatically creating 8 different u-boot .deb packages by adjusting Igor's script.
    1) Add 
     to compile.sh
    2) Modify one line in lib/common.sh. Exchange 
    3) Add the following lines in lib/main.sh after "grab_kernel_version"
    # check whether we should just build a bunch of u-boot versions to
    # brute-force all available GMAC TX/RX delay variations.
    if [ "X${GMAC_DELAY_TEST}" == "Xyes" ]; then
            for TX in 0 1 2 3 4 5 6 7 ; do
                    for RX in 0 ; do
                            # search defconfig file for $BOARD
                            Defconfig="$(grep -i -- "-${BOARD}.dtb" ${DEST}/u-boot/configs/*_defconfig | cut -d: -f1)"
                            if [ ! -f "${Defconfig}" ]; then
                                    case ${BOARD} in
                            # patch defconfig with appropriate tx/rx values
                            MyTmpFile="$(mktemp /tmp/gmac_test.XXXXXX || exit 1)"
                            trap "cd /tmp; rm -f \"${MyTmpFile}\"; exit 0" 0 1 2 3 15
                            grep -v CONFIG_GMAC_TX_DELAY "${Defconfig}" | grep -v CONFIG_GMAC_RX_DELAY >"${MyTmpFile}"
                            cat "${MyTmpFile}" >"${Defconfig}"
                            echo -e "CONFIG_GMAC_TX_DELAY=${TX}\nCONFIG_GMAC_RX_DELAY=${RX}" >>"${Defconfig}"
            exit 0

    Step 2) and 3) as patch: http://pastebin.com/RC5WptYB


    After execution of compile.sh you will end up with 8 different u-boot.debs in output/output/u-boot containing different GMAC TX DELAY settings that can be installed using "dpkg -i".  Execution will stop afterwards.
    Next step is to patch gmac.c so that different definitions of "CONFIG_GMAC_RX_DELAY" might lead to different results.
  • Create New...