Jump to content

eth1 (2.5) vanished: Upgrade to 20.11.3 / 5.9.14-rockchip64


Recommended Posts

Posted

After upgrade to Armbian 20.11.3 Buster with Linux 5.9.14-rockchip64 eth1 has vanished.

 

❯ sudo ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 64:62:66:d0:06:18 brd ff:ff:ff:ff:ff:ff

 

Any ideas?

 

http://ix.io/2I2U

Posted (edited)

Yeah, that happens once in a while. No idea why. I'm on Armbian_21.02.0-trunk.8_Helios64_buster_current_5.9.12.img

Tried usbreset and device unbind tricks but none of them worked. Maybe I didn't do it right, I don't know. So I wrote a script to reboot when it happens. Just make sure your raid and cache and whatnot gets gracefully suspended before a reboot. This problem may occur at any time.

 

#!/bin/bash
DEVICEFILE="/sys/devices/platform/usb@fe900000/fe900000.usb/xhci-hcd.0.auto/usb4/4-1/4-1.4/4-1.4:1.0"
MISSINGTOKEN="/home/user/eth1_missing"

if test -f "$MISSINGTOKEN"; then
        if [ ! -d "$DEVICEFILE" ]; then
                rm -rf "$MISSINGTOKEN"
                printf "eth1 missing for 20 seconds on %s. \n" "`date`" >> /home/user/eth1missing_reboot_log
                /sbin/reboot
                #echo "rebooting"
        fi
else
        if [ ! -d "$DEVICEFILE" ]; then
                touch "$MISSINGTOKEN"
        fi
fi
if test -d "$DEVICEFILE"; then
        rm -rf "$MISSINGTOKEN"
fi

And added this to crontab of root for every 10 seconds. Saved me from getting up and connecting the usb cable a few times.

 

plz dont 'bash' my if statements, was half asleep when i wrote this.

Edited by clostro
Posted

Apparently @Igor removed the driver in this commit.

The driver in mainline kernel, only support RTL8152/R8153

 

  Reveal hidden contents

 

Posted

This has been a real pain the last few days. Glad I found this post. Fortunately there is the 1G port as a backup.

1. When is the fix coming?

2. And can it then be installt via the regular apt update & upgrade?

Posted
  On 12/16/2020 at 8:40 AM, toti said:

This has been a real pain the last few days. Glad I found this post. Fortunately there is the 1G port as a backup.

1. When is the fix coming?

2. And can it then be installt via the regular apt update & upgrade?

Expand  

1. Should be there already

2. yes

Posted

I can confirm on "5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux" and checking the ethernet ports via CLI sudo ip link show that both eth0 and eth1 are showing in the output. 

Thanks......

Posted

I did a fresh install of Armbian_20.11.4_Helios64_buster_current_5.9.14.img on the sd card and still having the same eth1 missing problem. Any ideas?

 

It was working ok for the most part, but when I did a couple of crystaldisk benchmarks back to back on the helios64 share, everything went blank. Connected the usb cable, did ifconfig, and eth1 went missing.

Posted (edited)
  On 12/15/2020 at 8:34 AM, Igor said:

Fix is on the way. Sorry for messing things up

Expand  

 

Rare mistake by Igor!

 

In few years I been lurking here, this is only time I can recall such thing happening.  Of course everyone make mistakes, and there were probably some I was not aware of, but this is first time I can remember such thing happening.

Edited by TRS-80
italicize remember
Posted
  On 12/16/2020 at 2:27 PM, clostro said:

I did a fresh install of Armbian_20.11.4_Helios64_buster_current_5.9.14.img on the sd card and still having the same eth1 missing problem. Any ideas?

Expand  

 

Are you sure you running the latest image ? Can you share the output of your dmesg command.

Posted

I'm currently running with the  Armbian_20.11.4_Helios64_buster_current_5.9.14 image.  Running OMV5 and currently rsyncing my media files from my Qnap server.

Seems to be syncing at a sustained rate of 60MB/sec  at the moment via eth1 2.5gb link.  Not sure if the 60MB/sec is optimal does anybody else have any other findings to compare with?  My Qnap 469L is setup with 2 x 1gb ethernet  aggregated to my Zyxel switch where the Helios64 is connected to the same switch on one of its 2.5gb ports.  Looks like the job (6.7TB) will take about 13 hours which is quite a bit quicker than the same sync my Qnap (2x1gb aggregated link)  to my Qnap (100mb) did in about 3-4 days :-)

So far so good.

Posted
  On 12/17/2020 at 4:17 AM, DBwpg said:

I'm currently running with the  Armbian_20.11.4_Helios64_buster_current_5.9.14 image.  Running OMV5 and currently rsyncing my media files from my Qnap server.

Seems to be syncing at a sustained rate of 60MB/sec  at the moment via eth1 2.5gb link.  Not sure if the 60MB/sec is optimal does anybody else have any other findings to compare with?  My Qnap 469L is setup with 2 x 1gb ethernet  aggregated to my Zyxel switch where the Helios64 is connected to the same switch on one of its 2.5gb ports.  Looks like the job (6.7TB) will take about 13 hours which is quite a bit quicker than the same sync my Qnap (2x1gb aggregated link)  to my Qnap (100mb) did in about 3-4 days :-)

So far so good.

Expand  

Ok.  I guess I spoke to soon in regards to the status of eth1 2.5gb link.  It appears the link is dropping.  It seemed to have run ok for about 5 hours then failed.   Noticed the link speed dropped to about 2-3MB and rsynch finally failed.  I rebooted the unit and restarted the rsync job.  It seemed to startup ok and ran for a few minutes with speeds hitting 60MB/sec and then failed.  My ssh connection also timed out.  Oh well.  Not sure what my next options are.  For the time being I'm shutting her down :-(

Posted
  On 12/17/2020 at 3:42 AM, gprovost said:

 

Are you sure you running the latest image ? Can you share the output of your dmesg command.

Expand  

 

I think so, I did a fresh install on the sd card from this file - Armbian_20.11.4_Helios64_buster_current_5.9.14.img.xz

 

Here is the dmesg output from after reboot. Can't access the previous boot records though. This one shows a lot of eth1 errors and warnings, but it hasn't crashed yet.

 

  Reveal hidden contents

 

I tried to put the output in both spoiler and code. Hope it works.

Posted
  On 12/17/2020 at 10:09 AM, DBwpg said:

How best create logs?

Sent from my SM-T713 using Tapatalk
 

Expand  

Ok.  I've rebooted my Helios64 and restarted the rsync job.  Currently running for about 30mins.  Will monitor.  I'm assuming I should have a look at either the messages and kern.log when the job fails?

Posted
  On 12/17/2020 at 8:03 PM, DBwpg said:

Ok.  I've rebooted my Helios64 and restarted the rsync job.  Currently running for about 30mins.  Will monitor.  I'm assuming I should have a look at either the messages and kern.log when the job fails?

Expand  

 

Ideally you have serial console opened with dmesg -w command running in order to catch any exception in case system freeze.

Posted
  On 12/18/2020 at 4:43 AM, gprovost said:

 

Ideally you have serial console opened with dmesg -w command running in order to catch any exception in case system freeze.

Expand  

Thanks.  I've setup a putty serial connection and setup dmesq-w running.  Will see what happens.  The re-established rsync session has been running for the last 14hrs without a hitch so far.  The job is now 44% complete.  Will sit tight to see whether it completes and if not hopefully find something in the logs.

Posted
  On 12/18/2020 at 5:55 AM, DBwpg said:

Thanks.  I've setup a putty serial connection and setup dmesq-w running.  Will see what happens.  The re-established rsync session has been running for the last 14hrs without a hitch so far.  The job is now 44% complete.  Will sit tight to see whether it completes and if not hopefully find something in the logs.

Expand  

Ok.  Synch job failed after about 16hrs.  My Qnap reported:

Type    Date    Time    Users    Source IP    Computer name    Content    
Error    2020/12/18    01:42:14    System    127.0.0.1    localhost    [Hybrid Backup Sync] Failed to complete Sync job: "One-way Sync 1". Remote host does not exist.    

 

Have attached dmesg log from Helios64 .  Not sure what to make of it.

dmesg-w output.txtFetching info...

Posted
usb 4-1.4: reset SuperSpeed Gen 1 USB device number 3 using xhci-hcd
r8152 4-1.4:1.0 eth1: Tx status -2
r8152 4-1.4:1.0 eth1: Tx timeout

 

I am getting a lot of these in the logs. Sometimes 5 or 6 times in a row. Entire network activity pauses for a few seconds when I get one of these.

Could it be because of a bad cable or something? Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG cable. Cable has metal jacket jacks and everything. I read somewhere that this could be caused by a cable, but this cable was working ok before.

Posted
  On 12/18/2020 at 8:03 AM, DBwpg said:

Ok.  Synch job failed after about 16hrs.  My Qnap reported:

Type    Date    Time    Users    Source IP    Computer name    Content    
Error    2020/12/18    01:42:14    System    127.0.0.1    localhost    [Hybrid Backup Sync] Failed to complete Sync job: "One-way Sync 1". Remote host does not exist.    

 

Have attached dmesg log from Helios64 .  Not sure what to make of it.

dmesg-w output.txt 22.56 kB · 8 downloads

Expand  

I was able to complete my media file copy from my Qnap server yesterday but thought I would try copy files from the Qnap server to the Helios64 today via Qnap's FileStation service (smb file copy) to the Helios64.  It was an attempt to copy directory structure of approx 95gb.  It appeared to be working but failed halfway thru causing the Helios64 to reboot.  I've included the dmesq-w output.  Not sure what's happening.  Looks like I'll have to consider delaying my migration to the Helios64 until there is a more stable firmware unless someone has some other ideas as to what I should be doing in the way of troubleshooting these issues.

dmesq-w.txtFetching info...

Posted
  On 12/20/2020 at 9:33 PM, DBwpg said:

I was able to complete my media file copy from my Qnap server yesterday but thought I would try copy files from the Qnap server to the Helios64 today via Qnap's FileStation service (smb file copy) to the Helios64.  It was an attempt to copy directory structure of approx 95gb.  It appeared to be working but failed halfway thru causing the Helios64 to reboot.  I've included the dmesq-w output.  Not sure what's happening.  Looks like I'll have to consider delaying my migration to the Helios64 until there is a more stable firmware unless someone has some other ideas as to what I should be doing in the way of troubleshooting these issues.

dmesq-w.txt 6.33 kB · 4 downloads

Expand  

Ok.  Thought I would include the latest dmesq-w log .  Helios64 appeared to reboot and I wasn't able to access the serial/usb console or via ssh.  Had to hard reboot the unit.  I've shut it down for now.  Will wait to see what transpires over the next few firmware updates.

Posted

I've been having the same issue of eth1 just not working right on a 2.5Gbps link on kernel 5.9.x. The same applies for a self-built 5.10.1 kernel and also when going back to r8152 2.13.0 instead of the default 2.14.0 (by adapting the revision in build/lib/compilation-prepare.sh). With a 1Gbps connection, everything is perfectly fine. This connection speed was "forced" by running "ethtool -s enp6s0 advertise 0x3f" on the desktop side, taking 2.5Gbps out of the advertised speeds for autonegotiation. I have a direct 5m Cat6 connection from my Helios64 to my desktop machine. The latter has a 2.5Gbps RTL8125 on the mainboard, using the vanilla kernel's r8169 module. MTUs were kept on 1500 for all tests.

 

Running the Helios64 on kernel 4.4.213 (via Armbian_20.11.4_Helios64_buster_legacy_4.4.213.img.xz), I was able to transfer 4TiB back and forth in parallel without any issue: Helios64 -> desktop was running with ~1.7Gbps; desktop -> Helios64 was running at ~2.1Gbps. 1.7Gbps aren't great, but ok enough with tx offload disabled - certainly better than "just" 1Gbps (~933, realistically).

 

One surefire way to kill the Helios64's eth1 is to just start a simple iperf3 run with it sending data to my desktop: "iperf -c <desktop-ip> -t 600". After a few seconds, the speed goes down to 0, the kernel watchdog resets the USB device (as per r8152.c's rtl8152_tx_timeout), it recovers for a few more seconds, and then eth1 is absolutely dead until I reboot the entire NAS. No ping, no more connection, nothing.

 

Here's what I can see via journalctl -f from a parallel connection when such a crash happens:

  Reveal hidden contents

 

If there are any more logs you'd need or software changes to try and apply, I'd be eager to dig deeper.

Posted (edited)
  On 12/19/2020 at 4:20 PM, clostro said:

usb 4-1.4: reset SuperSpeed Gen 1 USB device number 3 using xhci-hcd
r8152 4-1.4:1.0 eth1: Tx status -2
r8152 4-1.4:1.0 eth1: Tx timeout

 

I am getting a lot of these in the logs. Sometimes 5 or 6 times in a row. Entire network activity pauses for a few seconds when I get one of these.

Could it be because of a bad cable or something? Helios64 is connected to a Zyxel XGS1010-12 switch on its 2.5gbps ports via a CAT5E 26AWG cable. Cable has metal jacket jacks and everything. I read somewhere that this could be caused by a cable, but this cable was working ok before.

Expand  

 

user@helios64:~$ uname -a
Linux helios64 5.9.14-rockchip64 #20.11.4 SMP PREEMPT Tue Dec 15 08:52:20 CET 2020 aarch64 GNU/Linux

Well it isn't the cable... tried over the network disk benchmark again with a new cable (the supplied short black network cable, no metal jacket), and temporarily lost eth1 during the test again. Excuse my ignorance for I'm not an expert at any rate but could this behavior be due to any overheating of RTL8156 or VL815 chips? Could be that the thermal paste/pad isn't making enough contact on some units, which would be a very easy fix. 

It only starts happening under heavy load and not right away.

 

Unless I'm way off base and the devs have already figured it out as a software issue.

 

 

edit: upon further testing i can say that temporary disconnects appear to happen less often with the new cable. i don't know what to think anymore

 

 

Edited by clostro

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines