Upgrading cubox-i Buster from Kernel 5.7.y to 5.8.y breaks ethernet


Recommended Posts

Armbianmonitor:

Upgrading cubox-i Armbian Buster from kernel 5.7.x to Kernel 5.8.y breaks ethernet and I'm unable to get it connected again. Works fine again downgrading back to 5.7.y. This occurs on multiple cubox-i devices.

 

System diagnosis information will now be uploaded to http://ix.io/2zH6

 

dmesg | grep eth0
[    4.667327] fec 2188000.ethernet eth0: registered PHC device 0
[   22.369472] fec 2188000.ethernet eth0: Unable to connect to phy


nlcli
wlan0: connected to sketch-wlan
        "Broadcom BCM4330"
        wifi (brcmfmac), 6C:AD:F8:1D:36:25, hw, mtu 1500
        ip4 default
        inet4 10.1.0.41/22
        route4 0.0.0.0/0
        route4 10.1.0.0/22
        route4 169.254.0.0/16
...
eth0: unavailable
        "eth0"
        ethernet (fec), D0:63:B4:00:87:DD, hw, mtu 1500
...

nmcli con
NAME              UUID                                  TYPE      DEVICE          
br-4ba53c1beb78   580cd8d0-4b87-432d-b072-7b2191fc3dc8  bridge    br-4ba53c1beb78
br-667404d55215   21ae2dad-0462-4880-98f2-8b56ae09dafa  bridge    br-667404d55215
br-7d46b681eda0   3dc1b651-2795-45fe-9821-7b33111d038c  bridge    br-7d46b681eda0
my-wlan           5d87696c-418b-404b-a6be-a85fd10c89cf  wifi      wlan0           
Armbian ethernet  0a5bb1f6-799d-476f-9fe4-2ecc7f4fe055  ethernet  --

nmcli con up 'Armbian ethernet'
Error: Connection activation failed: No suitable device found for this connection (device eth0 not available because device has no carrier).

sudo ip a
...
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether d0:63:b4:00:87:dd brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 6c:ad:f8:1d:36:25 brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.41/22 brd 10.1.3.255 scope global dynamic noprefixroute wlan0
       valid_lft 85910sec preferred_lft 85910sec
...

sudo ip link set eth0 up
RTNETLINK answers: No such device

 

EDIT: this is link to armbianmonitor after downgrading back to 5.7.15 and ethernet (eth0) working once again

http://ix.io/2zHa

Edited by Armbian_User
Add link to diagnostics after downgrade back to working version
Link to post
Share on other sites
Armbian is a community driven open source project. Do you like to contribute your code?

On 10/4/2020 at 7:23 PM, Igor said:

I would suspect bug in Network Manager app since our Cubox-i in auto-testing facility - with Ubuntu Focal based Armbian - shows no troubles - no. 14:

 

https://beta.armbian.com/autotest.html

 

Upgrading Network Manager by hand or switching to some other network tooling? Did you try clean build?

Hello Igor,

 

Thanks for your response. I've had a chance to try your suggestions. None worked for me.

 

Here is a link to the armbianmonitor from a new, clean install using a fresh download of Armbian_20.08.1_Cubox-i_buster_current_5.8.5.img.xz

http://ix.io/2AvJ

and another after apt-get update && apt-get upgrade -y

http://ix.io/2AvL

 

Just to clarify, all my cubox-i's are the cubox-14x4 variant and all exhibit this issue.

 

Do you have any further suggestions I could try?

 

Thanks

Link to post
Share on other sites

Apparently the issue is still not solved with version 5.9.14 of the kernel. After applying this kernel and a reboot the ethernet was broken again.:( Back on the old 5.6 kernel.

Hope there will be a permanent solution soon because I would like my buster to be up to date.

 

 

Edited by Kaaf
wrong version
Link to post
Share on other sites
14 minutes ago, Kaaf said:

Hope there will be a permanent solution

 

Probably the only permanent solution is constant maintenance which will not come just like that - you need dedication, lot of free time and cash. We can't cover nothing at the degree users would want - already most of the project costs goes from our private budgets and it is always dealing with our goals / problems / families or solving never ending problems that bug users.

 

On 11/16/2020 at 9:11 AM, Kaaf said:

Thanks for the good work!


Feels good! :) A (rare) compliment usually helps to move things on, but we are too low on resources and swamped out with core things to move things around and fix after some random problem in upstream Linux kills support for certain device. It happens all the time ...

 

We are asking people / your for help all the time so we could build things up - both are coming slowly, but significantly slower and totally out of sync from what you/users would wants to have.

 

Helping people which could perhaps look and fix this someday?

Link to post
Share on other sites

this break happens when you update from

linux-dtb-current-imx6_20.08_armhf.deb             29-Aug-2020 20:02

linux-image-current-imx6_20.08_armhf.deb           29-Aug-2020 20:03

(still working, linux-5.7.15-imx6)

to

linux-dtb-current-imx6_20.08.1_armhf.deb           30-Aug-2020 20:34

linux-image-current-imx6_20.08.1_armhf.deb         30-Aug-2020 20:34

(breaking on *some* cubox-i,  linux-5.8.5-imx6 )

 

the obvious reason for breaking is that kernel driver 

./kernel/drivers/net/phy/at803x.ko

is no longer loading - driver exists, but manual loading via modprobe results in an error message now

 

comparing the two packages shows a difference in file "modules.alias"

/lib/modules/5.7.15-imx6
alias mdio:00000000010011011101000001000001 at803x
alias mdio:000000000100110111010000011?0010 at803x
alias mdio:000000000100110111010000011?0100 at803x
alias mdio:000000000100110111010000011?0110 at803x

(working)

to

/lib/modules/5.8.5-imx6
alias mdio:00000000010011011101000001000001 at803x
alias mdio:00000000010011011101000001110010 at803x
alias mdio:00000000010011011101000000100011 at803x
alias mdio:00000000010011011101000001110100 at803x
alias mdio:000000000100110111010000011?0110 at803x

(not working on *some* cubox)

not only the number of lines differs, also check the question marks in the working config!

It seems that the driver checks these HW keys in order to ensure HW is really there - and if these keys does not match driver does not load and hence eth0 is not working.

Depending which exact type of cubox you have ... it will load - or not - which might explain why it works on Igor's cubox while other have issues.

Unfortunately just editing the modules-alias won't do the fix, we also need to generate a "modules.alias.bin" after the change -

does anyone know how to do this (so I can resume testing)??

 

Link to post
Share on other sites

I'm currently in contact with maintainers of this driver "at803x" - the first assumption that just a single line regarding the device ID (see my post above ... modules.aliases, which btw is created by depmod) was missing in the driver source was (unfortunately) wrong. The driver for this device has been widely reworked from 5.7.x (rather simple, only recognizing AT8030 and AT8035) to 5.8.x (now rather complex and supporting much more devices from that family beyond 8030/8035) - 

Another finding was that NOT ALL cuboxes are impacted -  already reported by Igor that his regression box is still ok.

From my collection 2 out of 3 are working ok also with 5.8.x kernel and higher

This one has issues (stopped working after upgrade to 5.8.x)

SolidRun i4P TV-300-D

while these are still working

SolidRun 4x4 300-D 
SolidRun i2EXW 300-D 

if the (last 3 bytes from) MAC address is just incrementing during production then the one box having issues is in between the two which are working - 

The two which are working have WiFi and the one which has issues does not -  but that does not fit to the post above from "Armbian User" who reported issues on a WiFi version

 

There might be one more thing

find /sys -name phy_id
/sys/devices/platform/soc/2100000.bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/2188000.ethernet-1:00/phy_id

reported on the systems which work (on 5.8.x and beyond)

while my system which breaks reports

/sys/devices/soc0/soc/2100000.bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/2188000.ethernet-1:04/phy_id

which has two differences:

after /devices/ we see /soc0/ instead of /platform/

and further in the path

/2188000.ethernet-1:04/ instead of /2188000.ethernet-1:00/

device tree seems to be the same for all 

there are small differences - and these seem to matter - 

 

if someone still reads this post ... and wants to contribute ... and has a system which broke from upgrading 5.7.x > 5.8.x *kindly check on your system* and post results here - for

find /sys -name phy_id

 

which is always an important part of troubleshooting to sort apart systems which still work and those which break - and to find reliable criteria which belongs to which group

Link to post
Share on other sites
18 hours ago, chrismade said:

if someone still reads this post ... and wants to contribute ... and has a system which broke from upgrading 5.7.x > 5.8.x *kindly check on your system* and post results here - for

find /sys -name phy_id

 

I have this issue on 2 out of 3 cubox-i4x4-300-D models
All 3 have WiFi working successfully

Here is the output from the working (5.9.14) system:
 

find /sys -name phy_id
/sys/devices/platform/soc/2100000.bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/2188000.ethernet-1:00/phy_id
find: ‘/sys/module/at803x/drivers’: Input/output error

Here is the output from one of the broken (working on 5.7.15) system:
 

find /sys -name phy_id
/sys/devices/soc0/soc/2100000.bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/2188000.ethernet-1:04/phy_id
find: ‘/sys/module/at803x/drivers’: Input/output error

 

Link to post
Share on other sites

Hi,

 

I'm trying to investigate this issue, but TBH its hard to follow the results.

 

Zitat

I have this issue on 2 out of 3 cubox-i4x4-300-D models

 

So you have exactly the same model and one of them is not working? Could you please share the output of:
 

find /sys -name phy_id
cat /proc/device-tree/model
dmesg

 

for these models when booting the 5.9.14 image?

 

-michael

Edited by mwalle
Link to post
Share on other sites

There is another observation ... the ones which started early on Armbian Debian Buster for cubox don't get any kernel updates - which might be another reason why the number of Armbian users reporting this issue is low - early Armbian Debian buster images receive updates on other packages and will be updated to 10.7 after "apt update && apt upgrade" but kernel remains in 5.3.1 (forever or only when updated manually) - the HW effect reported in this thread, that the PHY is usually on addr 00 while it seems on _some_ hardware it is on addr 04 was either ignored or somehow handled by the old driver until 5.7.x - so all these users won't experience this issue usually (however, remaining on older kernels have risks, too)

 

Regarding the issue it looks like a period of production - neither at the beginning nor at the end - PHY got a different addr (04 instead of 00) and the new driver from 5.8.x onwards only expects addr 00 - and hence does not work if PHY is on addr 04

 

I wonder if there is a serial number or similar - I haven't found anything like this yet -  if there is really none - maybe the MAC address printed on the bottom helps is to identify which cuboxes have the PHY on addr 04 ?

 

My only cubox having this addr on 04 instead of 00 has MAC = D0 63 B4 - 00 77 BB

@Armbian_User- or anyone else - can you pls check if your affected systems are near that range (the last 3 bytes matter) ? 

Link to post
Share on other sites
root@mgmt:~# cat /etc/armbian-release
# PLEASE DO NOT EDIT THIS FILE
BOARD=cubox-i
BOARD_NAME="Cubox i2eX/i4"
BOARDFAMILY=cubox
BUILD_REPOSITORY_URL=https://github.com/armbian/build
BUILD_REPOSITORY_COMMIT=b9adf0ea-dirty
VERSION=5.98
LINUXFAMILY=cubox
BRANCH=next
ARCH=arm
IMAGE_TYPE=stable
BOARD_TYPE=conf
INITRD_ARCH=arm
KERNEL_IMAGE_TYPE=zImage
root@mgmt:~# uname -a
Linux mgmt 5.3.1-cubox #5.98 SMP Fri Sep 27 23:11:49 CEST 2019 armv7l armv7l armv7l GNU/Linux
root@mgmt:~# find /sys -name phy_id
/sys/devices/soc0/soc/2100000.aips-bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/2188000.ethernet-1:04/phy_id
root@mgmt:~# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.0.1  netmask 255.255.255.0  broadcast 10.1.0.255
        inet6 fe80::4eee:3b87:e07b:7419  prefixlen 64  scopeid 0x20<link>
        ether d0:63:b4:00:83:30  txqueuelen 1000  (Ethernet)
        RX packets 3342175  bytes 606622753 (606.6 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2607316  bytes 244851799 (244.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

Link to post
Share on other sites

Just switched to 5.10.0. I noticed that the DTB has changed multiple node names. As a result, my user space could no longer apply all configuration quirks due to the changed paths in sysfs. Useing a DTB before 5.7.x resolved all observed regressions for me. Perhaps it is also worth trying in your case to run the current kernel with a pre 5.7.x DTB as a test.

Link to post
Share on other sites

Here the details about my cubox i4 which also had network issues with 5.8.x. It is currently running smoothly with an older version. I hope somebody helps this information as such driver issues go beyond my knowledge.
 

cat /etc/armbian-release

# PLEASE DO NOT EDIT THIS FILE
BOARD=cubox-i
BOARD_NAME="Cubox i2eX/i4"
BOARDFAMILY=imx6
BUILD_REPOSITORY_URL=https://github.com/armbian/build
BUILD_REPOSITORY_COMMIT=b9814056
DISTRIBUTION_CODENAME=buster
DISTRIBUTION_STATUS=supported
VERSION=20.11.3
LINUXFAMILY=imx6
BRANCH=current
ARCH=arm
IMAGE_TYPE=stable
BOARD_TYPE=conf
INITRD_ARCH=arm
KERNEL_IMAGE_TYPE=Image
uname -a

Linux <hostname> 5.7.15-imx6 #20.08 SMP Mon Aug 17 07:36:36 CEST 2020 armv7l GNU/Linux
find /sys -name phy_id

/sys/devices/soc0/soc/2100000.bus/2188000.ethernet/mdio_bus/2188000.ethernet-1/$
find: ‘/sys/module/at803x/drivers’: Input/output error
ifconfig eth0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet <ip router>  netmask 255.255.255.0  broadcast <ip cubox>
        inet6 2001:9e8:208c:6800:84e5:8994:7642:4da2  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::3721:95e0:891b:d0ef  prefixlen 64  scopeid 0x20<link>
        ether d0:63:b4:00:32:d9  txqueuelen 1000  (Ethernet)
        RX packets 17508632  bytes 3075857589 (2.8 GiB)
        RX errors 0  dropped 4  overruns 0  frame 0
        TX packets 6162802  bytes 803661074 (766.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0Cubox

 

 

 

Link to post
Share on other sites

I haven't heard back from kernel developers if/when we could expect a fix - I assume they are currently busy fixing other issues - if you are impacted or afraid to update your cubox because you might be impacted ... here is how you can fix it yourself (modify the required device-tree file). Luckily the tool "dtc" should be in our armbian image already and this tool works bi-directional, can make binaries out of source - and vice versa! Here is how

 

check if "dtc" = device-tree-compiler available - and if it is the required version:

cubox-i:~$ dtc -v
Version: DTC 1.4.7
cubox-i:~$

next...

read device-tree-binary (as non-root) from /boot/dtb/ and write source back into current directory:

cubox-i:~$ dtc -I dtb -O dts -o imx6q-cubox-i.dts  /boot/dtb/imx6q-cubox-i.dtb

ignore warning:
imx6q-cubox-i.dts: Warning (unique_unit_address): /soc/bus@2000000/iomuxc-gpr@20e0000: duplicate unit-address (also used in node /soc/bus@2000000/pinctrl@20e0000)
cubox-i:~$

next...

open in editor of your choice and find this sequence (should appear approx line 1250 ):

 ethernet@2188000 {
         compatible = "fsl,imx6q-fec";
         reg = < 0x2188000 0x4000 >;
         interrupt-names = "int0\0pps";
         interrupts = < 0x00 0x76 0x04 0x00 0x77 0x04 >;
         clocks = < 0x02 0x75 0x02 0x75 0x02 0xbe >;
         clock-names = "ipg\0ahb\0ptp";
         fsl,stop-mode = < 0x01 0x34 0x1b >;
         status = "okay";
         pinctrl-names = "default";
         pinctrl-0 = < 0x2e >;
         phy-handle = < 0x2f >;

delete the last line "phy-handle = < 0x2f >;"

a few lines below you will find this block

 ethernet-phy@0 {
         reg = < 0x00 >;
         qca,clk-out-frequency = < 0x7735940 >;
         phandle = < 0x2f >;
 };

again... delete line "phandle = < 0x2f >;"

next dublicate this block

 ethernet-phy@0 {
         reg = < 0x00 >;
         qca,clk-out-frequency = < 0x7735940 >;
 };

and modify the first two lines in the 2nd block to

 ethernet-phy@4 {
         reg = < 0x04 >;
         qca,clk-out-frequency = < 0x7735940 >;
 };


both blocks then look like

 ethernet-phy@0 {
         reg = < 0x00 >;
         qca,clk-out-frequency = < 0x7735940 >;
 };

 ethernet-phy@4 {
         reg = < 0x04 >;
         qca,clk-out-frequency = < 0x7735940 >;
 };

safe the file and compile to binary

cubox-i:~$ dtc -I dts -O dtb -o imx6q-cubox-i.dtb  imx6q-cubox-i.dts

unfortunately you will (again) see lots of warnings (really ... a lot):

 


imx6q-cubox-i.dtb: Warning (unique_unit_address): /soc/bus@2000000/iomuxc-gpr@20e0000: duplicate unit-address (also used in node /soc/bus@2000000/pinctrl@20e0000)
imx6q-cubox-i.dtb: Warning (clocks_property): /ldb:clocks: cell 0 is not a phandle reference
imx6q-cubox-i.dtb: Warning (clocks_property): /ldb:clocks: cell 2 is not a phandle reference
imx6q-cubox-i.dtb: Warning (clocks_property): /ldb:clocks: cell 4 is not a phandle reference

ignore warning - I usually take warning always very seriously but not in this case

now the fixed device tree is in file

imx6q-cubox-i.dtb


in your current directory

before you copy (now as root) into /boot/dtb you may want to _rename_ the old file - so you could mount the SDcard of your cubox on another computer and restore this original file if required

 

with this modified (self fixed) device tree newer kernels should have ethernet regardless if your cubox has PHY on addr #0 or #4

 

Note: device-tree-files can be used on various kernels from the same generation - between generations there might be breaking changes

Link to post
Share on other sites

alternatively ... and more elegant ... you can build that device-tree from kernel sources - e.g. download a 5.x.y kernel from kernel.org (I tested if with linux-5.10.6 ),  unpack it and enter the source directory ( cd linux-5.10.6 in my case )

 

This is the diff which I got from the kernel developer

 

diff --git a/arch/arm/boot/dts/imx6qdl-sr-som.dtsi b/arch/arm/boot/dts/imx6qdl-sr-som.dtsi
index b06577808ff4..3db08363d3fb 100644
--- a/arch/arm/boot/dts/imx6qdl-sr-som.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-sr-som.dtsi
@@ -53,7 +53,6 @@
 &fec {
     pinctrl-names = "default";
     pinctrl-0 = <&pinctrl_microsom_enet_ar8035>;
-    phy-handle = <&phy>;
     phy-mode = "rgmii-id";
     phy-reset-duration = <2>;
     phy-reset-gpios = <&gpio4 15 GPIO_ACTIVE_LOW>;
@@ -63,10 +62,15 @@
         #address-cells = <1>;
         #size-cells = <0>;
 
-        phy: ethernet-phy@0 {
+        ethernet-phy@0 {
             reg = <0>;
             qca,clk-out-frequency = <125000000>;
         };
+
+        ethernet-phy@4 {
+            reg = <4>;
+            qca,clk-out-frequency = <125000000>;
+        };
     };
 };

 

as you can see ... changes are applied to include-file  "imx6qdl-sr-som.dtsi" in directory  "arch/arm/boot/dts/"

 

getting the device-tree binary requires a step in between because the device-tree source files make use of "#include" - a directive which the device tree compiler does NOT understand, so the gcc preprocessor must help here

 

cpp -nostdinc -I include -I arch  -undef -x assembler-with-cpp  arch/arm/boot/dts/imx6q-cubox-i.dts imx6q-cubox-i.dts.preprocessed

 

then you can start "dtc" next

 

dtc -I dts -O dtb -p 0x1000 imx6q-cubox-i.dts.preprocessed -o imx6q-cubox-i-new.dtb

 

the fixed device-tree-binary is now in file " imx6q-cubox-i-new.dtb" which need to replace "imx6q-cubox-i.dtb" in /boot/dtb (again - consider rename the orgininal file instead of delete or overwrite)

Link to post
Share on other sites
On 1/18/2021 at 11:41 PM, chrismade said:

I haven't heard back from kernel developers if/when we could expect a fix

The fix has landed in 5.11, but as I don't see any stable tag, you have to wait till Armbian moves to 5.11 or compose an Armbinan PR for the patch to have it backported for older kernels.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...