Jump to content

[wildcat_paris] yet another Lamobo-R1 config thread


wildcat_paris

Recommended Posts

As tkaiser would say, Lamobo-r1 has terrible design but the concept is good (multiple usage possible: network, misc services)

 

But let's be honest, perfection is when you are dead. Like sleeping & coffee ;)
 
some weeks ago, I was wondering (right or wrong) if a DMA config for the lamobo-r1 GMAC would help the bandwidth issues. One CPU core is 100% with all the gazillions of interruptions to handle.
 
Now the DMA for A20 is in the mainline kernel but I am missing some tech glue.
 
I have read an example in the kernel 4.3 doc with DMA config (DTS) for DWMAC (*if* applicable, not only for STMMAC, but also for A20-GMAC ???)

links:


 
I have tried the recipe provided by the lamobo-r1 openwrt support crew, fifo buffer is not enough, also the code between 4.1 - 4.3 has changed for the STMMAC, patch is not portable, I have tried the coe/roe fix with no luck.
 
links:


 
At least, I was able to set the clock @1GHz, the cpu handling all the interruption, some small BW gain work (CPU is able to handle more interruptions).
 

 

NOTE: use governor "performance " as default (note: get a fan, even if it is not heating much, as of 2015/01/03 available http://www.voc-electronics.com/a-37420681/gpio-extensions/picoolfan/)

--- v4.3.3/arch/arm/boot/dts/sun7i-a20.dtsi     2015-12-24 19:45:36.704310828 +0100
+++ v4.3.3/arch/arm/boot/dts/sun7i-a20.dtsi     2015-12-25 22:59:41.876408694 +0100
@@ -98,9 +98,11 @@
                        device_type = "cpu";
                        reg = <0>;
                        clocks = <&cpu>;
+                       #clock-frequency = <960000000>;
                        clock-latency = <244144>; /* 8 32k periods */
                        operating-points = <
                                /* kHz    uV */
+                               1008000 1450000
                                960000  1400000
                                912000  1400000
                                864000  1300000
@@ -117,6 +119,8 @@
                cpu@1 {
                        compatible = "arm,cortex-a7";
                        device_type = "cpu";
+                       #clock-frequency = <960000000>;
+                       clocks = <&cpu>;
                        reg = <1>;
                };
        };

 


 
if I could get a hand on the possible DTS (uboot?kernel?) DWMAC DMA glue as in the kernel example (no link @ hand for now, see later)

http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/net/stmmac.txt

 

patchs to look at:

 

 

Edited by wildcat_paris
Link to comment
Share on other sites

I have read an example in the kernel 4.3 doc with DMA config (DTS) for DWMAC (*if* applicable, not only for STMMAC, but also for A20-GMAC ???)

About A20 DMA engine (quote from here):

AFAIK, the GMAC, USB, and SATA subsystems use their own DMA system, so 

they already use DMA and aren't affected by the dmaengine patches. 

 

The dmaengine patches are useful for the audio support, and could be 

useful for the security (encrypt/decrypt) chip support, and a few other 

such things. 

 

As tkaiser would say, Lamobo-r1 has terrible design but the concept is good

IMHO slapping cheap switch on a board with SoC originally designed for tablets, with single ethernet interface that doesn't support HW checksum offloading and calling it a "router" is not exactly a good concept  :)

Link to comment
Share on other sites

IMHO slapping cheap switch on a board with SoC originally designed for tablets, with single ethernet interface that doesn't support HW checksum offloading and calling it a "router" is not exactly a good concept  :)

 

True :) That's why I wrote 'The idea the R1 is based on is good'. But you're absolutely right: the SoC, the board and the single layer 2 switch for both WAN and LAN ports are wrong. The new Marvell ARMADAs with 2 or 3 independent GbE interfaces seem to be way more suited. 

 

13 days left: https://www.indiegogo.com/projects/turris-omnia-hi-performance-open-source-router#/

 

You get just the board for $99 + shipping -- given the state of R1's Wi-Fi (unuseable crap) it's simply a no-brainer to throw the R1 into the bin and pledge. I would believe the 500K stretch goal will also be reached so you get also a *good* metal case for this board for the same price.

 

Apart from that: Thanks for all the useful links. Will look through them (next year ;) ) but not with the R1 in mind but more focused on A20's GMAC and SATA performance in general (still hoping for the quad-core A20 successor Olimex spread rumours about)

Link to comment
Share on other sites

US$ 99.- =  WITHOUT: case, power supply, antennas, Wi-Fi cards and cooler. 

 

= totally useless

 

C'mon Tido, just think about. You get the R1 only without useable Wi-Fi (the module wasting one USB port is simply crap), PSU and enclosure. On top of that you have to solder a sane DC-IN solution or have to get the right cables since no PSU on this planet features an appropriate connector (for the battery connector). On top of that all commercially available enclosures ignore the thermal problems so you've to build your own.

 

Also: different people, different use cases.

 

I'm not interested in Wi-Fi right now, want to combine the board with a 3.5" SATA disk (using this adapter) and need therefore a special PSU solution also (5V/12V). It's a no-brainer to NOT spend 70 bucks on the Lamobo crapboard but to invest in the Omnia for a few bucks more instead.

Link to comment
Share on other sites

A 5% to 15% improvement on the network bandwidth with lamobo-r1, small but still something.

 

I was wondering why only one cpu/softirq thread was working, so now both thread/cpu are working (thread1 90%, thread2 5%)

 

reading the topic "Linux: scaling softirq among many CPU cores" http://natsys-lab.blogspot.fr/2012/09/linux-scaling-softirq-among-many-cpu.html

 

echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 2 > /sys/class/net/eth0/queues/tx-0/xps_cpus
 

+ the A20 is patched to run @1008MHz with "performance" governor

 

SoC temp is 37-39°C (fan is running @40°C => PWM @25%),

AXP209 +48.0°C +5.02 V/+0.99 A,

/dev/sda: SanDisk SDSSDP128G: 42°C due to B53 chip

 

 

BW with performance gov is better, no need to wait the CPU to scale the frequency up (ondemand/conservative), SoC doesn't heat much more

 

from PC ( http://beta.speedtest.net/result/4968734194) going through the lamobo-r1 to the Internet

 

 

RX 199 Mbits/s => 230 Mbits/s around 15% better (Internet link RX is 500 Mbits/s MAX)

TX 195 Mbits/s => 214 Mbits/s so about 5% better (but Internet link TX is a little above 200 Mbits/s MAX)

 

 

 

 

on the lamobo-r1 itself the move is 37MB/s to 50MB/s ( RX=438 Mbits/s Internet RX 500Mbit/s usually 450-470Mbit/s)

 

 

gr@bpi:~$ wget -O /dev/null ftp://ftp.oleane.net/ubuntu-cd/wily/ubuntu-15.10-desktop-amd64.iso
--2016-01-04 23:10:46--  ftp://ftp.oleane.net/ubuntu-cd/wily/ubuntu-15.10-desktop-amd64.iso
           => ‘/dev/null’
Resolving ftp.oleane.net (ftp.oleane.net)... 194.2.0.36, 2a01:c910:0:1::c202:24
Connecting to ftp.oleane.net (ftp.oleane.net)|194.2.0.36|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /ubuntu-cd/wily ... done.
==> SIZE ubuntu-15.10-desktop-amd64.iso ... 1178386432
==> PASV ... done.    ==> RETR ubuntu-15.10-desktop-amd64.iso ... done.
Length: 1178386432 (1.1G) (unauthoritative)

100%[================================================================================================>] 1,178,386,432 56.7MB/s   in 21s

2016-01-04 23:11:08 (53.6 MB/s) - ‘/dev/null’ saved [1178386432]

 

 

 

from Odroid XU4 through lamobo-r1, RX max @224MBits/s

 

 

gr@odroid:~$ wget -O /dev/null ftp://ftp.oleane.net/ubuntu-cd/wily/ubuntu-15.10-desktop-amd64.iso
--2016-01-04 23:20:57--  ftp://ftp.oleane.net/ubuntu-cd/wily/ubuntu-15.10-desktop-amd64.iso
           => «/dev/null»
Résolution de ftp.oleane.net (ftp.oleane.net)… 194.2.0.36, 2a01:c910:0:1::c202:24
Connexion à ftp.oleane.net (ftp.oleane.net)|194.2.0.36|:21… connecté.
Ouverture de session en tant que anonymous… Session établie.
==> SYST ... terminé.    ==> PWD ... terminé.
==> TYPE I ... terminé.  ==> CWD (1) /ubuntu-cd/wily ... terminé.
==> SIZE ubuntu-15.10-desktop-amd64.iso ... 1178386432
==> PASV ... terminé.    ==> RETR ubuntu-15.10-desktop-amd64.iso ... terminé.
Taille : 1178386432 (1,1G) (non certifiée)

ubuntu-15.10-desktop-amd64.iso     100%[==================================================================>]   1,10G  28,2MB/s   ds 40s

2016-01-04 23:21:38 (27,8 MB/s) - «/dev/null» enregistré [1178386432]

 

 

Link to comment
Share on other sites

Tested dts and driver patch today on cubietruck, kernel 4.4-rc8. Without A20 speed patch.

Maybe there is small improvement, ~800 Mbps -> ~900 Mbps, with some extra tweaks left from before. 

Distributing tx and rx interrupts helps in synthetic tests, but in real world scenarios (i.e. samba file transfer when it hugs single CPU core with 100% usage) it won't help much, and for me it even made file transfer speeds worse before when I tested it.

 

Edit: jumbo frames are still broken for me

Link to comment
Share on other sites

Now, after I thought more about my test results and @wildcat_paris' test methods:

  • iperf3 TCP test is not the best way to test raw Ethernet performance, I'll try to do more iperf3 tests with UDP later on fresh Armbian images;
  • Enabling jumbo frames with "bugged_jumbo=1" didn't cause driver lockup like it did before, so I'll have to check other things;

 

from PC going through the lamobo-r1 to the Internet

 

Depending on your firewall setup your speed improvements may be result of higher CPU frequency and not A20 GMAC patches, even though it counts as a "real world scenario" test.

 

Edit: did some tests with iperf3.

For me these are the top results, stmmac patches didn't have any noticeable effect.

 

 

TCP, from Win8.1

c:\Program Files\Tools>iperf3 -4 -c cubietruck.lan -i 0 -b 1000M
Connecting to host cubietruck.lan, port 5201
[  4] local 192.168.1.101 port 1985 connected to 192.168.1.105 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.06 GBytes   915 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.06 GBytes   915 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.06 GBytes   915 Mbits/sec                  receiver

iperf Done.

UDP, from Ubuntu Wily. Didn't bother to increase iperf buffer sizes to decrease packet loss.

➜  armbian  % _ iperf3 -4 -c cubietruck.lan -i 0 -b 1000M -u
Connecting to host cubietruck.lan, port 5201
[  4] local 192.168.1.102 port 34763 connected to 192.168.1.105 port 5201
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec  144760
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec  0.053 ms  53575/144760 (37%)
[  4] Sent 144760 datagrams

iperf Done.

 

 

 

Edit 2: Feel like UDP testing with such high packet loss may not be useful, will try to redo it with higher buffer sizes.

Link to comment
Share on other sites

I tried this patch from OpenWRT. It (and other tweaks like increasing stmmac bufer sizes) may reduce CPU load, but it's harder to measure than network speed. These patches are not present in mainline kernel.

 

Just to try to identify bottlenecks in your setup I would recommend you to test network performance with iperf3 between lamobo-r1 and any device in LAN; same without any iptables rules; same with unconfigured switch (no VLANs); same with simple tweak "sudo ethtool -k eth0 gso off". Since TCP window autoscaling is enabled by default and it's affecting test results, I would recommend running each test at least 3 times.

Link to comment
Share on other sites

@zador

 

yes thanks for the idea to test with "iperf3" (with different TCP windows size values) gives the value from OpenWRT patches

 

I have also tested the public servers

 

 

Tweaks to STMMAC and U-boot - stmmac driver I have tweaked the driver to enable RX checksum and improve TX rate (still not full gigabit speed) to maintain a 400Mbit/s rate for TX and 900Mbit/s rate for RX.

 

But it only works: Internet <=> Lamobo-r1

using the L-r1 as a "router" for others machines on the LAN limits the BW to 236/205 MBits/s (as with my previous tests) = more or less 400 MBits / 2 (+ IPtables in the middle)

 

I will be testing soon the XU4 with an extra USB3/GMAC to act as a simple router. XU4 was my plan B for routing when I have ordered it.

 

(ok my AMD Phenom2 965 PC with 2 GMAC is working very fine as a router but... power consumption is terrible as a 24/7 machine)

 

private joke : with your spoiler, you make my "coffee pouring out of ..." my nose while laughing aloud :)

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines