Hi,
We've built a network experimentation system using NanoPi R1 modules, but when load testing the system with iperf3, we are seeing the on-board RTL8152 interface enter a state where it does not seem to be responding until the board is power cycled.
The system consists of 32 NanoPi modules connected in a torus, with the 1Gig interface connected to its neighbour's 100Mbit interface. An additional USB Ethernet dongle functions as a control interface. The modules are powered from a Ubiquiti PoE switch, using this type of isolated PoE splitter. Eyeballing the switch UI, power consumption per node seems to hover between 2.3W-3.5W, depending on load. We're using the latest Ubuntu-based image, with kernel 5.4.28.
Under light loads, the system works fine, but when performing load tests, we're seeing the RTL8152 entering some kind of lock-up state. As can be seen from the system logs, first the netdev watchdog triggers, then the kernel tries to reset the USB device, finally reporting that it is unable to enumerate the device. After encountering this problem, if the node is rebooted, the USB device can still not be enumerated, and will remain unresponsive until it is powercycled. The external USB Ethernet dongle stays working through all of this, so it seems that the USB host controller and subsystem works OK. We haven't seen any issues with the 1Gig interface.
Searching the net, I have found reports with similar backtraces, but they seem to be either several years old, or related to the RTL8153 chip, which shares the same driver. In either case, none of them seem to offer any answers as to what's going on, or if there is any fix.
Has anyone else seen this? Any ideas if it's a driver or hardware issue?
Relevant bit from the kernel log:
Full diagnostic dump: