Jump to content
  • 0

[Invalid] - (Partially) Fixed PCIe training problem


strongtz
 Share

Question

Hi. I've got a ROC-RK3399-PC SBC with a JMB585 PCIe to SATA extension card.

Unfortunately PCIe always doesn't work on boot, whether it is in u-boot or kernel, with `PCIe link training gen1 timeout` error. 

Just as the following log:

Quote

Trying to boot from SPI


U-Boot 2021.07-rc4-ga2c3fda4-dirty (Jun 21 2021 - 15:28:06 +0800)

SoC: Rockchip rk3399
Reset cause: POR
Model: Firefly ROC-RK3399-PC Mezzanine Board
DRAM:  3.9 GiB
PMIC:  RK808
MMC:   mmc@fe310000: 2, mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from SPIFlash... SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Firefly ROC-RK3399-PC Mezzanine Board
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  0
Card did not respond to voltage select! : -110
Card did not respond to voltage select! : -110
rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!

 

Quote

root@ubuntu-server:/# dmesg | grep pcie
[    3.756325] vcc3v3_pcie: supplied by sys_12v
[   22.150142] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
[   22.150185] OF: /pcie@f8000000: Missing device_type
[   22.150203] rockchip-pcie f8000000.pcie:      MEM 0x00fa000000..0x00fbdfffff -> 0x00fa000000
[   22.150217] rockchip-pcie f8000000.pcie:       IO 0x00fbe00000..0x00fbefffff -> 0x00fbe00000
[   22.150905] rockchip-pcie f8000000.pcie: no vpcie12v regulator found
[   22.676881] rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!
[   22.678469] rockchip-pcie: probe of f8000000.pcie failed with error -110
[   33.774070] vcc3v3_pcie: disabling

 

However, after I manually run these commands, PCIe seems working.

```modprobe -r pcie_rockchip_host
modprobe pcie_rockchip_host

```

 

Kernel log:

Quote

[  338.208262] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
[  338.208315] rockchip-pcie f8000000.pcie:      MEM 0x00fa000000..0x00fbdfffff -> 0x00fa000000
[  338.208330] rockchip-pcie f8000000.pcie:       IO 0x00fbe00000..0x00fbefffff -> 0x00fbe00000
[  338.209131] rockchip-pcie f8000000.pcie: no vpcie12v regulator found
[  338.345743] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
[  338.345772] pci_bus 0000:00: root bus resource [bus 00-1f]
[  338.345788] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
[  338.345801] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff] (bus address [0xfbe00000-0xfbefffff])
[  338.345853] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
[  338.345975] pci 0000:00:00.0: supports D1
[  338.345983] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[  338.350714] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  338.351105] pci 0000:01:00.0: [197b:0585] type 00 class 0x010601
[  338.351195] pci 0000:01:00.0: reg 0x10: initial BAR value 0x00000000 invalid
[  338.351206] pci 0000:01:00.0: reg 0x10: [io  size 0x0080]
[  338.351245] pci 0000:01:00.0: reg 0x14: initial BAR value 0x00000000 invalid
[  338.351253] pci 0000:01:00.0: reg 0x14: [io  size 0x0080]
[  338.351289] pci 0000:01:00.0: reg 0x18: initial BAR value 0x00000000 invalid
[  338.351297] pci 0000:01:00.0: reg 0x18: [io  size 0x0080]
[  338.351333] pci 0000:01:00.0: reg 0x1c: initial BAR value 0x00000000 invalid
[  338.351342] pci 0000:01:00.0: reg 0x1c: [io  size 0x0080]
[  338.351378] pci 0000:01:00.0: reg 0x20: initial BAR value 0x00000000 invalid
[  338.351386] pci 0000:01:00.0: reg 0x20: [io  size 0x0080]
[  338.351421] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x00001fff]
[  338.351458] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[  338.351500] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 512)
[  338.351863] pci 0000:01:00.0: PME# supported from D3hot
[  338.352236] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:00.0 (capable of 15.752 Gb/s with 8.0 GT/s PCIe x2 link)
[  338.356709] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01
[  338.356831] pci 0000:00:00.0: BAR 14: assigned [mem 0xfa000000-0xfa0fffff]
[  338.356857] pci 0000:01:00.0: BAR 6: assigned [mem 0xfa000000-0xfa00ffff pref]
[  338.356870] pci 0000:01:00.0: BAR 5: assigned [mem 0xfa010000-0xfa011fff]
[  338.356893] pci 0000:01:00.0: BAR 0: no space for [io  size 0x0080]
[  338.356904] pci 0000:01:00.0: BAR 0: failed to assign [io  size 0x0080]
[  338.356913] pci 0000:01:00.0: BAR 1: no space for [io  size 0x0080]
[  338.356922] pci 0000:01:00.0: BAR 1: failed to assign [io  size 0x0080]
[  338.356931] pci 0000:01:00.0: BAR 2: no space for [io  size 0x0080]
[  338.356944] pci 0000:01:00.0: BAR 2: failed to assign [io  size 0x0080]
[  338.356956] pci 0000:01:00.0: BAR 3: no space for [io  size 0x0080]
[  338.356969] pci 0000:01:00.0: BAR 3: failed to assign [io  size 0x0080]
[  338.356981] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0080]
[  338.356989] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0080]
[  338.357002] pci 0000:00:00.0: PCI bridge to [bus 01]
[  338.357016] pci 0000:00:00.0:   bridge window [mem 0xfa000000-0xfa0fffff]
[  338.357335] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[  338.357848] pcieport 0000:00:00.0: PME: Signaling with IRQ 95
[  338.358358] pcieport 0000:00:00.0: AER: enabled with IRQ 95
[  338.434167] ahci 0000:01:00.0: version 3.0
[  338.434209] ahci 0000:01:00.0: enabling device (0000 -> 0002)
[  338.434743] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[  338.434858] ahci 0000:01:00.0: AHCI 0001.0301 32 slots 5 ports 6 Gbps 0x1f impl SATA mode
[  338.434868] ahci 0000:01:00.0: flags: 64bit ncq sntf stag pm led clo pmp fbs pio slum part ccc apst boh
[  338.442224] scsi host1: ahci
[  338.443217] scsi host2: ahci
[  338.443893] scsi host3: ahci
[  338.445008] scsi host4: ahci
[  338.446838] scsi host5: ahci
[  338.447158] ata1: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010100 irq 96
[  338.447179] ata2: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010180 irq 97
[  338.447188] ata3: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010200 irq 98
[  338.447197] ata4: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010280 irq 99
[  338.447206] ata5: SATA max UDMA/133 abar m8192@0xfa010000 port 0xfa010300 irq 100
[  338.762463] ata1: SATA link down (SStatus 0 SControl 300)
[  339.073142] ata2: SATA link down (SStatus 0 SControl 300)
[  339.385252] ata3: SATA link down (SStatus 0 SControl 300)
[  339.700894] ata4: SATA link down (SStatus 0 SControl 300)
[  340.013306] ata5: SATA link down (SStatus 0 SControl 300)

 

Does anyone have any idea about what is going on? Thanks in advance.

Link to comment
Share on other sites

7 answers to this question

Recommended Posts

Help Armbian team helping you!

  • 0

Is it the same as this issue (sounded vaguely familiar, so I dug up the link):

 

https://wiki.pine64.org/wiki/ROCKPro64#PCIe_Controller_Hardware_Error_Handling_Bug

 

Note: Even though this is on ROCKPro64 wiki, if you read you will see it's about the PCIe controller in RK3399 itself...  Having said that, it's been a while since I read it, so not sure its same thing or not.

 

EDIT:  Or maybe it's just problem with ROC-RK3399-PC only.  There is a long thread here where people been working on it for years already, if I recall correctly the last I read it still wasn't working.

 

But that link training error sounds familiar, if the answers are not in either of places mentioned above, I will have to try harder to remember where I saw that.

Link to comment
Share on other sites

  • 0

Your issue report is not a valid bug report per the Armbian bug reporting instructions (https://www.armbian.com/bugs).  With limited resources the Armbian project is only able to spend time on issues where all the requested information has been provided and for only the boards/images/software that are supported.  Your report is invalid for one or more of the following reasons (non-exhaustive list):

 

  • it is for an unsupported board or image (CSC/EOS/WIP/edge)
  • it is for software that is not supported (such as userspace modules installed on top of the core operating system)
  • it has been logged in the wrong forum (for example requests for help that are not actual bug reports)
  • it lacks requested data (armbianmonitor output)
  • it could have been easily solved by a quick search and/or reading documentation

 

Please review what you have submitted and the bug logging instructions (https://www.armbian.com/bugs) and either add the required information or open a new topic in the correct forum (such as Common issues / peer to peer technical support or General chit chat)

Link to comment
Share on other sites

  • 0

Now I've got some ideas.

It seems that some PCIe cards can't deal with 3.3V power well.

From the log above, we know that kernel disabled the fixed-regulator vcc3v3_pcie, as it was unused after PCIe probe failure. When I manually reloaded the module, the regulator got enabled instantly (PCIe device getting its power), and PCIe link training went on smoothly.

 

It leads me to think: 

Some PCIe devices might require immediate link training after getting 3.3V power, otherwise it fails to probe.

 

AFAIK vcc3v3_pcie is enabled much earlier than the PCIe controller driver gets loaded, which might be the cause of the problem.

So, is there a way to enable that regulator only when needed? (when the PCIe driver gets loaded)

Link to comment
Share on other sites

  • 0

Well, it doesn't seem to be the case now.

After link training timeout, I just need to reload the pcie kernel module, no matter when it's reloaded.

(vcc3v3_pcie was kept on during the process by modifying device tree)

 

Is there any way to workaround the problem without manual reloading?:)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...