Jump to content

sick and tired of my Armbian desktop locking and crashing


AxelFoley

Recommended Posts

@pfry

 

Looking at the power Specifications for PCIe x 4 => Suggest that it can be powered from 3.3v (9.9W) & 12v (25W), however from the RockPro64 Power schema it looks like they negate to feed the 12v rail from the Supply voltage to the PCIe.

 

Instead Pine feeds the PCIe Interface on the board only by the 3.3v (3A) rail (9.9W).

From the Power schema It also looks like the board designers feed PCIe from the 5.1v rail converted by the RK808 PMU to 1.8v on vcc1v8_pcie (not sure how this is intended to be used on PCIe) 

 

 

I am using the Pine PCIe ver 3.0 NVMe Card with a Samsung 970 EVO 500GB and a RockPro64 v2.1 Board.  PSU is a 102W 12v LRS-100-12

 

The Max power draw of the EVO 970 NVMe is 5.8W (1.76A) which should be within spec for that 3.3v rail.

 

But this bit worry's me    "vcc_sys: could not add device link regulator.6 err -2"   

 

vcc_sys is the 3.3v rail that feeds the vcc3v3_pcie feed into the PCIe Socket.

Although dmeag later says it can enable a regulator for the 3.3v vcc3v3_pcie rail hanging off vcc3v3_sys.

 

I also see this warning;

 


"Apr  8 19:09:24 localhost kernel: [    2.010352] pci_bus 0000:01: busn_res: can not insert [bus 01-ff] under [bus 00-1f] (conflicts with (null) [bus 00-1f])"

 

I may have to get out a JTAG debugger to work this one out :-( 

 

Not sure if this is a kernel driver issue or HW Power design. 

 

ill see if I can get any Gerber files to scope out the voltage and current spikes.

 

 

Link to comment
Share on other sites

19 hours ago, AxelFoley said:

 

@AndrewDB    .... looks like you may be correct it was power all along  but it looks like its a kernel issue with the PCIe Power Management ?

Well, I think the one conclusion one can draw here is that it's a bad idea to rely on experimental hardware with a not fully debugged kernel for real, serious development work. As chwe wrote, perhaps a more "conservative" approach would yield better results?

 

Unfortunately I am of the humble opinion that by the time you get your cluster to function reliably and are able to begin using for development work, the rk3399 will be considered obsolete, and you will have invested a lot of time, energy and moolah in essentially what will have become a paperweight.

 

Still, I hope you can sort it out asap. Best of luck! :thumbup:

Link to comment
Share on other sites

8 hours ago, AxelFoley said:

@pfry

 

Looking at the power Specifications for PCIe x 4 => Suggest that it can be powered from 3.3v (9.9W) & 12v (25W), however from the RockPro64 Power schema it looks like they negate to feed the 12v rail from the Supply voltage to the PCIe.

 

Scroll down to sheet 27 of the schematic.

 

8 hours ago, AxelFoley said:

Instead Pine feeds the PCIe Interface on the board only by the 3.3v (3A) rail (9.9W).

From the Power schema It also looks like the board designers feed PCIe from the 5.1v rail converted by the RK808 PMU to 1.8v on vcc1v8_pcie (not sure how this is intended to be used on PCIe) 

 

The 1.8V power is for the PCI-e phy (power/enable/both? - I didn't look that closely), not the slot itself. If the RK808 wiring is board-specific it could explain a lot of the difficulties with individual boards (I haven't compared 'em).

Link to comment
Share on other sites

58 minutes ago, TonyMac32 said:

Some regulators can be/are used differently for different boards. For instance, the RK808 is also used on the MiQi and Tinker board RK3288 boards.

I was just referring to the RK3399 boards (given the extra regulators for the A72s and GPU, it's apparent that the RK808/818 was designed for a less power-hungry SOC), but yeah, it's pretty obvious to me now (!) that the implementations vary. A lot, considering the relatively minor variances, mostly in the peripherals (audio codec, W-Fi, eMMC, etc.). The Realtek Ethernet phy is pretty much a standard, at least. What a tangle. Nothing that can't be solved with enough time and money. It's too bad both are tough to come by.

Link to comment
Share on other sites

I have been monitoring the voltage and current draw of an individual RockPro64 on boot with the PCIe NVME Connected going to console.

I then launched the desktop as both root (Armbian desktop) and pico user (looks like XFCE) and looked for any power spikes.

There was none!

Both tests caused a lockup and HW Freeze.

On boot the peak current draw was 0.7A with 12v steady, and current draw dropping to 0.3A steady.

Then I launched Xorg as root (in the past tended to be more stable) and user pico (always locked up immediately)

 

I did notice a difference when I ran startx as pico user and it launched XFCE .. its current draw peaked at 0.7a and voltage dropped to 11.96V

Desktop immediately locked the board on launch.

 

When I ran startx as root and it launched armbian desktop ... it only peaked at 0.6A and voltage never dipped below 12v.

The desktop was responsive until when I loaded the Armbian forum in chrome and triggered the board lockup ..... there was no current spike or voltage drop! The board locked up while only drawing 0.29A.

 

I checked the Pine forum and other people are reporting exactly the same issue as myself with the PCIe/NVMe setup.

One person in 2018 reported he fixed the issue by launching a different kernel.

 

I have logged the problem on the pine forum and got this response;

 

"Currently working on the issue. It seems - as odd is its sounds - that the problem is somehow linked to pulseaudio. If you uninstall pulseaudio, and use alsa instead, the issue will just vanish. We have tried blacklisting PCIe for pulse in udev, and it prevents the issue from happening, but it also returns a segmentation error (SATA card / other adapter not accessible). Its very very strange".

 

Should I stop posting here and move the discussion to Pine ?

 

 

 

 

 

 

XFCEDesktopPowerDraw.jpg

ArmbianDesktopPowerDraw.jpg

ArmbianDesktopLockup.jpg

Link to comment
Share on other sites

On 4/10/2019 at 12:33 AM, AxelFoley said:

I have been monitoring the voltage and current draw of an individual RockPro64 on boot with the PCIe NVME Connected going to console.

I then launched the desktop as both root (Armbian desktop) and pico user (looks like XFCE) and looked for any power spikes.

There was none!

Both tests caused a lockup and HW Freeze.

On boot the peak current draw was 0.7A with 12v steady, and current draw dropping to 0.3A steady.

Then I launched Xorg as root (in the past tended to be more stable) and user pico (always locked up immediately)

 

I did notice a difference when I ran startx as pico user and it launched XFCE .. its current draw peaked at 0.7a and voltage dropped to 11.96V

Desktop immediately locked the board on launch.

 

When I ran startx as root and it launched armbian desktop ... it only peaked at 0.6A and voltage never dipped below 12v.

The desktop was responsive until when I loaded the Armbian forum in chrome and triggered the board lockup ..... there was no current spike or voltage drop! The board locked up while only drawing 0.29A.

 

I checked the Pine forum and other people are reporting exactly the same issue as myself with the PCIe/NVMe setup.

One person in 2018 reported he fixed the issue by launching a different kernel.

 

I have logged the problem on the pine forum and got this response;

 

"Currently working on the issue. It seems - as odd is its sounds - that the problem is somehow linked to pulseaudio. If you uninstall pulseaudio, and use alsa instead, the issue will just vanish. We have tried blacklisting PCIe for pulse in udev, and it prevents the issue from happening, but it also returns a segmentation error (SATA card / other adapter not accessible). Its very very strange".

 

Should I stop posting here and move the discussion to Pine ?

Some people think the problem around the pcie is related to a bug in the rockchip uboot.

uboot from 2020 seems to fix the problem. I havent tried it.

 

With the "standard" card you can buy with the rockpro64 i could not see any sata disks but in OMV i got the famous:

ata1: hard resetting link - And the sata card was not visible at all, and i did try almost everything. Nothing postive happened at all.

 

I was getting tired and did buy a cheap StarTech 2X ESATA + 2X SATA III 6 GB but still with chipset ASMedia - ASM1061 I was surprised it worked. Because everyone was sure the chipset was the fault, i has been told the Marvell 88SE9215 chipset was way better. 

I have now been writing on 6TB in a row, no errors yet.

10 DL in progress at same time, with each DL around 8.7 - 9.3 MB/s with the cheap StarTech with chipset ASMedia - ASM1061.

 

Tested with Armbian image and OMV installed - very pleased with the result.

 

ROCKPro64 Board ver. 2.0, dated 2018-04-11. Both 2GB and 4GB version. 

On these models PCIe 3.3V power is not supplied and the PCIe are therefore not functional.

(Remove R89538 resistor that affects U7109 TCS4484B power regulator, it will supply 3.3V to PCIe bus) - Not tested by myself.

Just to be clear this fix is ONLY for ROCKPro64 Board ver. 2.0, dated 2018-04-11. Both 2GB and 4GB version. 

 

Link to comment
Share on other sites

3 hours ago, Salvador Liébana said:

How to track whats going on guys?

You could higher the verbosity. Default it is set to 1. The higher, the more messages you get. Best to use a ttl debug cable.
Verbosity is in /boot/armbianEnv.txt
Have you set governor to performance? That is one issue with some RK3399's. I don't think it's the same as you're having.

Link to comment
Share on other sites

setting the governor to performance should not change this situation. I did not have those hangs on mate, but right now armbian on RK3399 under xfce, the official desktop environment of Armbian, is extremely unstable. it hangs randomly. most of the time on browsing.  this should be a priority, the OS doesn't work on mainline. it hangs randomly. and yes, maybe it's platform specific, but RK3399 is the moat widely used SoC, so this requiere a fix. I don't have a UART so it's hard to me to give any real debug of what happening, but it's not just me. anyone may be able to reproduce it. I know I received this OS for free, but this should not happen.

Link to comment
Share on other sites

There has been a lot of development on desktop lately.  I am not sure how much has been pushed anywhere you can download it yet, though.  Lanefu was reporting really good results the other night in IRC about his testing with Mate on PineBook.  I have not heard anything about XFCE one way or the other until your report just now (so, thanks for that).  However I think, more good things coming for desktop users, soon(TM).

Edited by TRS-80
add sentence about XFCE
Link to comment
Share on other sites

12 часов назад, Salvador Liébana сказал:

armbian on RK3399 under xfce

You didn't specify the exact device model, the exact image name, what settings you changed, and so on. Without this, this is an empty conversation.

 

 

P.S. I have on different models with rk3399 (when used correctly) there are no problems, even at maximum load.

Link to comment
Share on other sites

I've just done the test with the NanoPi M4 with this image https://redirect.armbian.com/region/EU/nanopim4/Focal_current_desktop
And I can confirm the crashes. Black screen, no more blinking light, only the dim LED still on.
I tried with performance governor and again had it. 

3 x in less than half an hour. 
So this doesn't seem the same issue as before. I do suspect it's a kernel issue.
I did have simular crashes before, and that then was fixed by putting governor to performance. Never again had a crash on that (M4V2 with Armbian Reforged V1)
I'll order a friendlyElec ttl adapter. I blew up mine on another device. That's what happens when they all use a different pinout g.d. :) 

If anyone has some debug info. Please share. 
I'll try a debian image and see if it's the same. 

Link to comment
Share on other sites

Already running 30 minutes with buster 5.9.10 and haven't seen anything strange. I installed firefox. Browsed with both chromium and firefox, installed boinc-manager and it's running fine. 
I'll keep it running all night to see if something happens. 
@Salvador Liébana Or you sure it's because of xfce, or could it be a newer kernel? Now it's 5.9.10. It used to be 5.8 on Armbian Forged. I'll try tomorrow with both mate 5.9.10 and xfce4 5.9.10, and a lower kernel version on Focal xfce4. 
I've got no clue what could cause this. 

Link to comment
Share on other sites

And 10 seconds later it crashed... :)
Booting back and again crash in 10 seconds. All cores were maxed out for no reason. 
3rd boot. Again the same, 30 seconds. All cores are maxed out for no reason.
No time to change anything...

Link to comment
Share on other sites

Just tried focal nightly. https://minio.k-space.ee/armbian/dl/nanopim4/nightly/Armbian_21.02.0-trunk.8_Nanopim4_focal_current_5.9.12.img.xz
After installing the desktop it also crashed. Always the same crash. Black display, no flashing led. Dim led still on. 
I'll try an older image.

Update. Focal 5.8.6 is the same. Even with governor set to performance. 

Link to comment
Share on other sites

I have tested Focal Xfce desktop on a Rockpi-4b, with Firefox, glmark2, video playing, etc., for over half an hour, and didn't have any problem. I have not been able to reproduce the bug.

 

If someone can give me clear steps to reproduce it, I will try to look into it. Otherwise, I consider it solved.

Link to comment
Share on other sites

3 hours ago, JMCC said:

If someone can give me clear steps to reproduce it, I will try to look into it. Otherwise, I consider it solved.

It's been fixed to my knowledge. It was the mesa update that was buggy.
Still a few other bugs need to be worked out. But this problem seems to be gone. 

I do think there's another issue. Panfrost is enabled by default in desktop and server images, I don't know if it has a path for correct update. 
Certainly for server images I'd disable it since it can only make problems. For desktop I don't mind.

The other issue with ondemand governor I've not seen anymore. So might also be fixed. If anyone could confirm.

Link to comment
Share on other sites

18 minutes ago, NicoD said:

it can only make problems

As I said, there is one kernel config per board family, regardless of which packages are installed on userspace the kernel remains the same.  May people will want to download or build a console-only image and then install desktop on it (I do it quite often), so it makes sense to keep it this way.

 

Also, even people running headless may want to use the OpenCL features offered by panfrost, which is an additional reason for keeping it enabled.

 

Besides, having the graphics driver enabled in the kernel should not cause any kind of problem when you are not using the module (that is, no gpu accelerated apps running).  Mali drivers have always been enabled on legacy kernels, without being used in most cases, and it has never been an issue. If some problem ever arose from panfrost on servers, then it would be time to think about the best solution for it. But, as of today, it is not a problem at all, unless the contrary is proven.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines