System crash with network manager being the cause


Go to solution Solved by Sonikku,

Recommended Posts

Greetings everyone.
I have sucessfully used Armbian Bionic in a production environment. Its really great.
I am now evaluating Armbian Buster and I am seeing a strange issue perhaps you can direct me where to look so I can do troubleshooting.

With the board connected to our corporate network, it takes about 48 hours when the whole board (OrangePi+ 2E) will crash with the CPU burning up.
The second time it happened I had htop running to try and diagnose the problem.

 

Investigation revealed that rngd became unstable due to the nm-applet. This led me to re-running the known conditions when this happens, with the network disconnected and now it doesn't crash.

Below is a screenshot of the crashed state.


image.thumb.png.7bbef71c311f2e2dda7bd185de6206f2.png

 

Any ideas of where to look would be appreciated. My gut feel tells me this is network related and indeed, by disabling the network interface the problem goes away.

Link to post
Share on other sites
Donate and support the project!

For the CPU speed I keep everything as standard as possible. I haven't done that no.

To put it this way, this is stock Armbian with one item installed... RabbitMQ. We thought we had a problem with RabbitMQ and we found that Buster addressed it somehow, so that's unrelated.

Link to post
Share on other sites

I have had a similar experience with NetworkManager.

The solution was to set a fixed IP.

All problems solved!

 

Some history.

=============

My Orange PI PC was running an old Armbian 5.11 (Debian 8.11).

Had been running rock sold since installation date, with a fixed IP.

A WD 500GB SATA disk is attached (USB docking station: JMicron, JMS579) to store files (approx. 50 GB/day). 

 

I did a fresh install: "Armbian 20.05.2 Orangepipc Debian buster" with DHCP enabled (IP reservation on the router).

Result: random halting of the system (runs headless in a remote location).

Sometimes after an hour, sometimes after a few hours, sometimes a bit longer.

Even checksums of the files were sometimes (1 à 2%) calculated wrong!

Files got sometimes corrupted (5-10%) when transferred over the wired connection.

Recent syslog etc were missing when the system halted.

Installing watchdog was not a real solution of course but would most of the time reboot the system (not always).

Setting a fixed IP solved all the problems.

 

The Orange PI PC was and is running at max 1008MHz (SoC runs between 480 and 1008MHz using conservative governor).

I also installed a fan to keep things cooler:  max 54°C under full load (stress @ 4cpus).

But this did not solve the problems.

Only setting a fixed IP solved al other issues.

 

Link to post
Share on other sites

Hi

I have some feedback regarding this issue.

 

I am using Armbian Focal (current version) and the issue persisted, until, I found out where it was coming from.

I have been able to repeatedly generate the system crash at will, it takes a few hours but it can be guaranteed to happen.

 

The cause is a particular Windows 7 host that has not been updated in years. It runs Windows 7 SP1 and the Windows Update was last run 3 years ago.

Other Windows 7 machines have been updated and they have no effect on this board running Armbian Focal.

So I guess there is some network bug in old Windows 7 versions (which Microsoft has since patched, most likely) such that when the host browses the SMB share or even does a host lookup on the Armbian host, it sends something that upsets the network stack, because in this condition, while the OrangePi is technically unresponsive, it still, surprisingly responds to ICMP requests.

What I have done is to remove the particular Windows 7 host from the network (unplugged the LAN cable) and the problem has disappeared. I am now on 6 days of uptime on Armbian. Therefore I am fully convinced the Windows 7 host has a network stack bug, sends garbage to the OrangePi board and this somehow upsets the TCP stack in Armbian.

 

Neither another machine running the same Windows 7 copy (but it is updated until Microsoft stopped rolling out patches in March 2020), or a Windows 10 machine, nor several Ubuntu machines, nor a Macbook Pro affect this OrangePi.
Solid as a rock.

I will investigate the issue further if I have time, to pinpoint the exact cause but honestly, its easier to get rid of Windows at this point. The machine in question is used for DTP uses i.e. Corel Draw and other software that's not available on Mac or Linux.
 

Link to post
Share on other sites

I think I know what the problem is

 

The Windows machine also runs NordVPN, which is used for accessing censored content outside of my country (certain necessary software updates are blocked by our government)
Recently, NordVPN changed the underlying network transport driver, which has not only broken a few things on the machine itself, it seems that when NordVPN is allowed to run on the machine and the machine has access to the OrangePi, that's when things go horribly wrong. Other devices accessed in a similar way have also shown odd symptoms.. an Android media player locked up and froze... so some combination of my software is either no good or there's a bug in Windows 7 that has since been patched.

I am enjoying 9 days of uptime on the OrangePi and have elected to rather format the Windows 7 machine, reinstall all the software and keep it updated this time. NordVPN stuff will be done on a different machine now.

Link to post
Share on other sites
On 7/29/2020 at 12:21 PM, Sonikku said:

I think I know what the problem is

 

The Windows machine also runs NordVPN, which is used for accessing censored content outside of my country (certain necessary software updates are blocked by our government)
Recently, NordVPN changed the underlying network transport driver, which has not only broken a few things on the machine itself, it seems that when NordVPN is allowed to run on the machine and the machine has access to the OrangePi, that's when things go horribly wrong. Other devices accessed in a similar way have also shown odd symptoms.. an Android media player locked up and froze... so some combination of my software is either no good or there's a bug in Windows 7 that has since been patched.

I am enjoying 9 days of uptime on the OrangePi and have elected to rather format the Windows 7 machine, reinstall all the software and keep it updated this time. NordVPN stuff will be done on a different machine now.

CVE-2020-10730 in samba possibly culprit.

Link to post
Share on other sites
  • Solution

Final feedback and solution
The issue happened again, and at the same time I was seated at my desk:
Setup:
Macbook Pro /w Kensington Desktop Hub to provide DisplayPort and Ethernet
HP Desktop PC with Windows 7
Polycom Desktop Phone

HP LaserJet Pro

Wifi Access point

All the above are on the same network switch

 

Event

I disconnected the Kensington device from my Mac (USB-C)
I noticed something was wrong when my phone reported no network (It makes a chime when this happens)
My Windows 7 box was doing a download and then said the internet was disconnected with high CPU usage.... hmmm....
My phone complained about no internet (Whatsapp went offline). Had to use the regular mobile LTE network.

Root Cause
When the Kensington unit is disconnected, it goes into a state where it jams the upstream network switch, it appears to flood the LAN with rubbish packets... the switch falls over or locks up. Also other devices see (and receive) these packets.
Left to happen long enough, the OrangePi will crash as I have described. My colleagues have been able to reproduce this, and we've decided to throw the Kensington unit out.
It is not a bug! The desktop phone also eventually crashes, and it takes out my Ubuntu 18.04 LTS file server eventually. The HP printer locks up completely and I have to power cycle it at the mains socket.

So there we have it.. finally. Rogue device.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...