Jump to content

Recommended Posts

Posted

Hi all!

 

On NanoPi Duo2 I'm trying to use the builtin video hw processor.

ffmpeg already works with -hwaccel v4l2request but throws errors:

 

Press [q] to stop, [?] for help
[h264 @ 0x11645f0] Using V4L2 media driver cedrus (6.12.35) for S264
[V4L2RequestContext @ 0xae5f0db0] Failed to create buffer of type 1: Cannot allocate memory (12)
[h264 @ 0x11645f0] Failed setup for format drm_prime: hwaccel initialisation returned error.

 

dmesg

[ 8906.864389] cma: __cma_alloc: reserved: alloc failed, req-size: 3038 pages, ret: -12
[ 8906.872255] cma: number of available pages: 42@86+128@384+34@3550+34@6622+34@9694+34@12766+34@15838+34@18910+34@21982+1570@25054=> 1978 free of 26624 total pages
[ 8906.886783] cedrus 1c0e000.video-codec: dma alloc of size 12443648 failed
root@nanopiduo2:~#

 

I already did play around with armbianEnv tweaking extraargs cma  but no success.

 

I found a link talking about VPU device tree dma limitations https://git.sec.in.tum.de/croemheld/linux/-/blob/v5.1-rc5/Documentation/devicetree/bindings/media/cedrus.txt

Zitat

Device-tree bindings for the VPU found in Allwinner SoCs, referred to as the Video Engine (VE) in Allwinner literature. The VPU can only access the first 256 MiB of DRAM, that are DMA-mapped starting from the DRAM base. This requires specific memory allocation and handling.

I already decompiled the DT and verified there are is no such "reserved-memory"  section.   Is this the root cause?

Maybe someone can provide some hints or ideas confirming that I'm on the right track? If yes I'd give it a try adjusting the DT.

 

 

     T.

 

Posted (edited)
3 часа назад, schunckt сказал:

v5.1-rc5

3 часа назад, schunckt сказал:
[h264 @ 0x11645f0] Using V4L2 media driver cedrus (6.12.35)

v6.12.35

Please use the current documentation for the CURRENT kernel.

sun4i-a10-video-engine.yaml

sun8i-h3-deinterlace.yaml

Documentation/arch/arm/sunxi.rst

arch/arm/boot/dts/allwinner/sun8i-h3-nanopi-duo2.dts

arch/arm/boot/dts/allwinner/sun8i-h3.dtsi

 

 

P.S.

Please read this.

repository-for-v4l2request-hardware-video-decoding-rockchip-allwinner

Edited by going
Add P.S.
Posted

Great, thanks for the links!

Meanwhile it works partially. The DT tweaking was not needed. I made a mistake by specifying the armbian extraargs. I added a second line to armbianEnv.txt but realized all args must be one line. I had to increase cma=256M (Yes, really, tested all lower values). Then it works, BUT ...

 

Fun fact, will further investigate:

Decoding with "-hwaccel drm" results in lower fps (about 6..8) whereas software decoder goes up to 10 😀

Decoding has been verified with htop. CPU only => 4 cores 100%. hwaccel one core about 20% which is likely th e yuv to rgb and scaling. 

 

Maybe this is still an issue caused by DT, at least when reading https://gregdavill.com/posts/allwinner-s3-videoencoders/ 

It specifies 

memory-region = <&cma_pool>; 

  which is missing in my decompiled DT, also the referenced reserved-memory section.

Maybe that's not needed if it's coded inside the driver or specified elswhere.

 

 

  T.

Posted
On 8/10/2025 at 7:34 AM, going said:

v6.12.35

Please use the current documentation for the CURRENT kernel.

If it doesn't work, compile your own Armbian with EDGE linux (what worked for me).

Stay away from Trixie at this time (its mpv doesn't work as well as Bookworm's)

Posted

Some fixes are kernel specific. If I understand correctly, the "memory-region" is only necessary when using the legacy cedar driver with a more recent kernel. It is supported up to kernel 6.1. You can confirm CMA allocation by running "sudo dmesg | grep CMA" or by running "cat /proc/meminfo | grep Cma"

 

That's interesting, although cedrus only acts as the video decoding engine while the display engine is responsible for the actual rendering.

Posted

@robertoj right, that worked for me as well

Zitat

Then add extraargs=cma=256M to armbianEnv.txt

 

But the remaining issue is the slowness. Meanwhile I also tested ffmpeg unscaled and no rgb conversion to /dev/null outputt but still slow.

Maybe there is some pre/postprocessing done which cloud be tuned further. If i remember right there are some v4l2* features which may impact the processing but I'm not sure if this was camera capture related.

 

Another idea: It seems the VPU clock source is configurable (inside DT) maybe that's not quite right?

 

    T.

Posted

No, I did not compile this time. I used the downloaded image (need to double check exactly which one).

 

Before trying this path (have to update my build env first😀)  I'd try to get a better understanding about the root cause of the slowness.

I think next I'll play around with mpv instead of ffmpeg.

I'd prefer ffmpeg for other reasons, but testing mpv is worth to spend some time.

 

 

   T.

Posted

My main theory is that linux 6.12 doesn't have the v4l2 improvements needed for hw acceleration, that you can only get with linux 6.13....

 

The link i published explains that.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines