Hi!
The root of cause is not controller itself, but pcie phy. It's not properly initialized on module init. So when device is cold-booted (after power on) phy stay in "factory" state, but after warm reboot phy stay in "power_on" state and prevent any training.
Patch https://github.com/armbian/build/pull/4308/files works because after first fail phy reset by pcie driver.
https://github.com/andrewz1/rk3399-pcie-phy this is updated phy module and dtb overlay for switch to it (I just changed name). This is just proof of concept.
patch attached (tested on edge kernel)
phy-rockchip-pcie.patch