usual user
-
Posts
164 -
Joined
-
Last visited
Everything posted by usual user
-
It was only a shot in the dark. I was inspired by this post. Attempting to trigger a next frequency change while a previous one is still in progress would have been a good explanation, as the crash appears to occur through dynamic frequency scaling. But having this configuration right is a good idea in any case. It's not always about it working or not. Most of the time, it's about not working as well as possible and all the small drawbacks add up in overall performance.
-
Out of curiosity, with mainline kernel and ondemand governor in place. Apply this: echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate Does it still crash in your use case? If it does not crash any longer I will explain what is going on.
-
I doubt it collect enought fine grained data for a good interpretation. Here is a tmon log visualisation of a glmark2-es2 run: Look at the temperature differences between CPU and GPU, despite the small distance.
-
For a first test replace your rk3399-rockpro64.dtb by the one provided in this post. If your original dtb is carriing different configuration you can use the fdtoverlay method to apply rk3399-rockpro64-tz.dtbo (build from rk3399-rockpro64-tz.dts) static to your base dtb. This will work with mainline fdtoverlay without modification. Using it as dynamic overlay should also work. The second rk3399-rockpro64-tz.dts post was only to showcase how to change a value of an existing binding.
-
I have only looked at the attached dtb and there only the 105°C trip point is bound to a cooling device. But without a tmon log I can't tell how often this is fireing. Fequency scaling is controlled by CPU load via governor or user settings. It does not obey any thermal restrictions. Cooling devices obey thermal restrictions and control the scaling.
-
Don't confuse the junction temperature with the ambient temperature of the heatsink, as that is the one the CPU temperature sensor is reporting. How long it takes to propergate to the surface depends on the thermal resistance and can take quite some time. If I read the previous attached dtb correctly, then the thermal system kicks in at 105°C. Relying on load dependent DVFS does not take any account in thermal constraints. How the thermal system really works can be deduced from a tmon log created during a stress test.
-
Single Armbian image for RK + AML + AW (aarch64 ARMv8)
usual user replied to balbes150's topic in TV Boxes's General Chat
... -
English is probably not balbes150 native language and he is using a translator. "Hijacked" would be a better translation, your post is off topic in that thread.
-
For the details you have to read the TRM (e.g.) of the specific device. VOP is how Rockchip is calling the display subsystem components. It is as mem2mem exposed and is e.g. usable for hardware accelerated video scaling in video pipelines. Sorry, you are to late in the game With my current setup, as soon as the memory pressure issue is resolved, I am feature complete.
-
That configuration looks sane to me. That is the problem, basic KMS/drm acceleration does not suffice. You need the full set to be efficient. E.g. to have also accelerated video output from the VPUs and moving an entire frame buffer for only a line scroll somewhere on the screen is very inefficient. But any drm IP has different capabilities. Take for example i.MX6, the drm IP has very little capabilities but a separate 2D GPU. And the 3D GPU renders in a format that can't be scaned out by the drm IP. It has to be converted by the 2D GPU first. This is something modesetting can't cope with and probably the reason why the armada driver exists. I don't know why no suitable driver is available for Xorg because code already exists for Wayland drm backends. Perhaps the structure of Xorg is not suitable to implement it in a similar way, or the relevant developers have moved to Wayland already. I had to switch to kernel 5.9.0-rc5 with some panfrost patches from linux-next on top. For Mesa I'm using current master branch. With this in place I get a flawless working. The only issue left is some sort of memory pressure on heavy GPU use: But development is a moving target and I see this which looks somehow related.
-
lima is not the culprit, it is modesetting which uses it improperly. Disable it for modesetting by: Section "Device" Identifier "KMS-1" Driver "modesetting" Option "AccelMethod" "none" EndSection and leave lima in place so e.g. kodi-gbm and Wayland compositors can make use of it. You should also include a stanza like this: Section "OutputClass" Identifier "dwhdmi-rockchip" MatchDriver "rockchip" Option "PrimaryGPU" "TRUE" EndSection Section "OutputClass" Identifier "Meson-IP" MatchDriver "meson" Option "PrimaryGPU" "TRUE" EndSection Section "OutputClass" Identifier "Exynos-IP" MatchDriver "exynos" Option "PrimaryGPU" "TRUE" EndSection so that modesetting immediately selects the correct /dev/dri/cardX node for the display subsystem, without autoprobing and guessing. Haveing all stanzas for the drivers you want to support in place simultaniously does no harm because any device is equipped with one IP and only that one will match.
-
ftdoverlay is a convenient way to apply an overlay staticly to a base dtb. You spare the DTC decompile - manually edit - DTC compile dance. Usually you write overlays with label refernces, but to be able to apply such an overlay, the base dtb has to be compiled with the @-option. This has significant impact on the size and distributions usually don't do this. When you write the overlay with full paths, it contains all the information to be applied to a base dtb that was not created with the @-option. The mainline ftdoverlay need the patch to be able to apply it. Edit the pwms property to any value you like as shown in the provided rk3399-rockpro64-tz.dts (50000 default changed to 10000).
-
There is nothing hardcoded, there is only a default value. And as one value does not fit them all, you have to set it to the value that fits your need. See rk3399-rockpro64-tz.dts for reference. Unfortunately you cannot apply this with mainline ftdoverlay because it does not support full path notion for a reference to another node. But with fdt_overlay.patch.txt applied it is working as expected.
-
The right way to use the fan would be to have a proper thermal setup (rk3399-rockpro64-tz.dts) in the devicetree. With this the kernel thermal system can handle the management. This is a visualisation of a tmon log documenting the working of the thermal system: rk3399-rockpro64.dtb is a mainline dtb with rk3399-rockpro64-tz.dtbo applied via: fdtoverlay --input rk3399-rockpro64.dtb --output rk3399-rockpro64.dtb rk3399-rockpro64-tz.dtbo See if this is working for you by replacing your dtb and check with tmon.
-
I gave up xorg and switched to plasma-desktop. Kwin is supporting a wayland backend and so I get a lightning fast graphics desktop with all bells and whistles. Ok the bugs at the panfrost stack still exist but this environment makes efficient use of anything that is available. Thanks to the configurability of kwin, I can have the same look and feel of my previous desktop.
-
Inspired by you, I have also done some more tests on my site. For me it is also freezing. Since I am on panfrost, we can rule out lima and panfrost for this. The one we have still in common is rockchipdrm. i.MX6 is using imxdrm and is not suffering this flaw, so IMHO the display subsystem is responsible for this error. I don't know how mature the lima GL support in Mesa already is, so IMHO Mesa is to blame here. But we are dealing with 2D acceleration functions of the display subsystem for Xwindow, so these errors are not relevant for our further investigations. The concept of a dedicated cursor plane is gone in atomic modesetting. The plane is handled as any other plane, but the constraints of the plane are still obeyed. The selection of a suitable cursor plane will most probably select this one, but any other one can be chosen.
-
This is only the proprietary kernel part that is already implemented in mainline via /dev/dri/renderD128. The missing functionality is how the binary bloob uses it, which must be implemented via the as yet non-existent armsoc submodule. glamor has it already but using it via modesetting is sub optimal because of KMS/drm implementation design decisions there.
-
Exactly, they were done on the same device. The buffer pass around forces the 3D GPU IP to slow down because the required buffers are not available by time. The performance hit for the display output isn't reflected by the log but by visual inspection it makes huge difference. In both cases, the 3D rendering power is sufficient to allow a flowing 60Hz display. The DRM scan-out buffer is handed to the Mali proprietary OpenGL ES libraries and they do the buffer dance in the bloob via the Mali proprietary kernel interface. When 3D rendering is done the buffer is handed back to DRM and the scan-out takes place. This is what the submodule has to implement with the Mali rendernode (/dev/dri/renderD128) In the early days it was a security guard to protect stable installations. It made dma_buf support accessible and usable when drivers provide the support. I don't know if it is still required or meanwhile obsolete. It is still dangling around in my configurations.
-
Single Armbian image for RK + AML + AW (aarch64 ARMv8)
usual user replied to balbes150's topic in TV Boxes's General Chat
To switch back to the old method simply rename extlinux.conf. e.g. mv extlinux.conf extlinux.conf-disabled But I don't know if the old files get still maintained. -
Reread my log analysis of Xorg.0_driver_as_rockchip.log. You setup two screens there. One driven via modesetting with the 3D render node (card1) as the display subsystem with no scan-out hardware. You can't see the result of any 3D hardware rendering on screen 0 on any monitor. The second screen is driven by fbdev with display subsystem (card0) as the display subsystem with scan-out hardware. Hence my proposal to use Section "Device" with Driver "fbdev" to see that it will deliver the same output results without setting up the unusable render node screen. This only indicates that you get 2D hardware acceleration via fbdev emulation without 3D support. fbdev is like armsoc, it is also missing a submodule for 3D support. It used to be doing everything via fbdev device and hence is deprecated. Same as for armsoc, you need a submodule. But armsoc is the better choice since you can make use of full KMS/drm acceleration. Alternatively you can rewrite modesetting to not delegate everything to OpenGL and use dma_buf for buffer pass around. This is not a problem for rk322x SOCs only. It applies to all devices that use render nodes. The less CPU power the device has, the more disadvantage the buffer pass causes.
-
This is all about where the memory is located where the operations takes place. In PC world there is only one "GPU" IP. It is implementing everything. Display engine for scan-out and GPU for OpenGL. Once the GPU has rendered directly to the scan-out memory the hardware of the display subsystem outputs it to the monitor. So offloading anything on OpenGl is a good idea. It is a device independent standard and doing composition for movie video is also a fast path. There is no need to support display subsystem acceleration in the CPU area. But we are dealing with SOCs. They have several IPs where the memory they are dealing with is separated. I.e. They need to pass around memory buffers so that they can work on data that they share. The buffer format has to be identical between different IPs otherwise you have to convert. A "dumb buffer" format is always possible but you loose acceleration features of special formats. But this requires device dependent knowledge. E.g. a display subsystem may support NV12 format for scan-out. Uploading NV12 data for compositing on the 3d GPU and then forwarding via dump buffer to the display subsystem will not improve the performance, but forwarding via dma_buf to the display subsystem will. The impact of improper buffer pass around can be seen by the uploaded glmark2 logs. The 3d performance is decreasing cause the GPU can not be served fast enough. The armada driver implements buffer pass around via etnaviv_gpu for i.MX6 in a device dependent manner and uses dma_buf for zero copy. See buffer-flow.pdf for the ways the buffers have to travel. modesetting and armsoc are missing this support, hence the low performance. Maybe the armada source can serve as a template for what is required. As both use the same method for buffer pass around, this is expected. modesetting is only optimized for PC like scenarios. Armsoc is only dealing with the display subsystem. It does not interact with lima or panfrost it is falling back to swrast as no armsoc_dri for mainline is available. (EE) AIGLX error: dlopen of /usr/lib64/dri/armsoc_dri.so failed (/usr/lib64/dri/armsoc_dri.so: cannot open shared object file: No such file or directory) (EE) AIGLX error: unable to load driver armsoc (II) IGLX: Loaded and initialized swrast (II) GLX: Initialized DRISWRAST GL provider for screen 0 In the Mali proprietary case that code took care for the proper buffer pass around via the proprietary kernel interface. But that doesn't belong in the Mesa counterpart, as it only cares about OpenGL and it doesn't matter how IPs interact. It provides only buffer import and export. For mainline in xorg the submodule is the proper place. For Weston it is the drm-backend which it already has.
