Hauke

Le Potato: Network stack crash on huge file writes

Recommended Posts

Dear all,

 

first, as being new here: Many thanks to all the work that went into this!

 

Now, here's my problem on Le Potato, which I have with armbian and LibreELEC: As soon as I copy anything large to any media, some services (SSH, Samba, Kodi, tvheadend - all of them network dependent) crash after writing something between 200 and 700 MB - not always at the same point. So anything like

 

cp /some/huge/file /home/me

 

crashes the services. Something like this:

 

cp /some/huge/file /dev/null

 

is no problem. I tried writing to SD as well as to USB, and I tried via network (scp from another machine into Le Potato) and locally (copying fom USB to SD, but still using ssh to run the command). Always crash. To make sure it's not an issue of LibreELEC, I now picked armbian (Ubuntu from here: https://www.armbian.com/lepotato/). And I did this instead of copy:

 

#!/bin/bash
LARGEFILE="/home/potato/largefile.txt"
# Clean up any old file
rm $LARGEFILE
# Create a string of 4096 bytes
MAKE4K=""
for i in `seq 1 4096`;
do
  MAKE4K="${MAKE4K}A"
done
# Write the 4096 byte string a million times --> 4G file
for j in `seq 1 1000000`;
do
  echo $MAKE4K >> $LARGEFILE
done

I SSH'd into the machine, started the script, and *boom* SSH down and out at about 250 MB written - all three sessions I had open. BUT: Script seems to keep running: Local console login survived, and file keeps growing, until it finally reached the target 4 GB. Closer inspection showed:

 

ping 192.168.1.1  --> destination unreachable

ifdown eth0

ifup eth0

ping 192.168.1.1 --> Success

 

And ssh is possible again. So my conclusion would be: network stack somehow gets corrupted.

 

/var/log/syslog does not show anything helpful.

 

Any idea? Is this reproducible by others?

 

Cheers

 

Share this post


Link to post
Share on other sites

on Armbian please give me the armbianmonitor -u output link.  This was a significant problem on older images, it should be fixed now, but if there's something new...

Share this post


Link to post
Share on other sites

Does not work:

root@lepotato:~# armbianmonitor -u
System diagnosis information will now be uploaded to <html>
 <head>
  <title>500 Internal Server Error</title>
 </head>
 <body>
  <h1>500 Internal Server Error</h1>
  The server has either erred or is incapable of performing the requested operation.<br /><br />

 </body>
</html>Please post the URL in the forum where you've been asked for.

Will try again later...

Share this post


Link to post
Share on other sites

Did a manual paste from "armbianmonitor -U":

 

https://pastebin.com/ty5ePBg3

 

Please note, I started 'armbianmonitor -c "$HOME"' (as suggested here: https://docs.armbian.com/User-Guide_Fine-Tuning/) but stopped that when I realized it will fill up my SD - so there's no output for taht in the paste.

 

EDIT: Just note that maybe I used an older image... had quite some problems with monitor setup, and tried a few images... Sorry if I caused uneccessary work. Still, since LibreELEC builds suffer from this problem: Could you point me ant some details the LibeELEC developers could profit from?

 

Thanks!

Share this post


Link to post
Share on other sites
Sun Feb 25 22:17:58 UTC 2018 | Le potato | 5.34.171121 | arm64 | aarch64 | 4.13.14-meson64

That is one of the early experimental kernels, the 4.14 kernels using the Bay Libre patchset have solved the issue.  I somehow doubt LibreELEC is based off of mainline linux, probably either 4.9 or 3.14 since the hardware acceleration isn't currently in mainline.  Part of the problem is linked to LPA issues and the other to EEE:

 

 

https://patchwork.kernel.org/patch/10142531/

https://patchwork.kernel.org/patch/10102343/

 

For Armbian an apt update/upgrade should fix it, as long as you can stay connected.  I'd use a wifi dongle if you have one.  There are quite a few other updates in there as well, such as drivers for the audio subsystem.

Share this post


Link to post
Share on other sites

Hi TonyMac32,

 

I can confirm that the problem is gone with the current kernel - thanks a lot and sorry again for bothering you with an old issue. For me that helps a lot, since I now know it's not my board but a known issue I can ask the LibreELEC team to have a look at.

 

Share this post


Link to post
Share on other sites

Hi TonyMac32,

 

may I ask your assistance again? I've read myself into compiling LibeELEC myself, and was able to identify the corresponding code parts in the old kernel drivers. The patches you pointed me at could go in nearly unaltered, but unfortunately it does not solve the issue. As afar as I understand, the first patch you mention addresses problems with the EEE feature, while the second addresses unsuccessful auto negotiation. I switched off both features on my network switch, and also forced the adapter into a fixed state using ethtool, but all the time things fail. This is why I suspect that the patches are not the complete solution. Still, the current Armbian kernel is free of this problem, so a solution must exist. Do you have any idea where I might look for other modifications?

 

Thanks!

Share this post


Link to post
Share on other sites

Those are the two patches that solved the issue on mainline.  It is possible the code isn't as directly portable as it appears.  Have you raised this as an issue with the LibreElec team?  (Or whoever's fork directly supports this board)

Share this post


Link to post
Share on other sites

OK, thanks for clarifying. So I'll go deeper and start debugging. Furtunately, I've got crash data output on the debug UART, this should make it possible to locate the code that actually causes the problem. Have never done this kind of debugging before, but since noone from the few LePotato-LibreELEC-developers currently is working on this bug, I think I need to learn myself :-)

Share this post


Link to post
Share on other sites
On 3/11/2018 at 8:16 AM, Hauke said:

OK, thanks for clarifying. So I'll go deeper and start debugging. Furtunately, I've got crash data output on the debug UART, this should make it possible to locate the code that actually causes the problem. Have never done this kind of debugging before, but since noone from the few LePotato-LibreELEC-developers currently is working on this bug, I think I need to learn myself :-)

Most of the code used by LibreELEC and others are based on Amlogic's SDK code which has numerous bugs on a really old kernel (3.14). There's efforts to move everything to mainline so that it includes fixes like this. Currently mainline is missing some critical features like video codecs support that is getting worked on.

Share this post


Link to post
Share on other sites

Hi Da Xue,

thanks for pointing this out. Indeed LibreELEC for LePotato is currently on 3.14.29, and I really look forward to when mainline kernel is ready - or at least ready enough to support LibeELEC. Mainly the video hardware acceleration is the key in my eyes, since for a media center that's the key part.

My biggest fear is, that Le Potato will share the fate of many SBC's and never reach maturity.

Share this post


Link to post
Share on other sites
6 minutes ago, Hauke said:

Hi Da Xue,

thanks for pointing this out. Indeed LibreELEC for LePotato is currently on 3.14.29, and I really look forward to when mainline kernel is ready - or at least ready enough to support LibeELEC. Mainly the video hardware acceleration is the key in my eyes, since for a media center that's the key part.

My biggest fear is, that Le Potato will share the fate of many SBC's and never reach maturity.

Libre Computer Project is one of my hobby projects/money pits. Le Potato is only the first of 3-5 products planned so rest assured it won't be an issue.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now