Mainline VPU


18 18

Recommended Posts

Armbian is a community driven open source project. Do you like to contribute your code?

1 hour ago, usual user said:

With gstreamer framework it is working, even for bbb_sunflower_2160p_30fps_normal.mp4 without any problem. It is mpv that seems not be able to cope with the hight data rate. Useing --hwdec=drm-copy makes the CPU utilization ramp up go away and the video plays smoothly, but to slow and the sound get out of sync. I don't think I am missing anything kernel wise.

Also on my tests drm-copy is consuming more cpu power. My guess that is because the decoded video buffer is copied for use as EGL image/texture, because the larger the video (and higher framerate) the bigger the cpu usage.

Plus mpv uses a single thread for decoding when hardware decoding is in place, so when you saturate a core the video plays slow.


I didn't try gstreamer yet, but is it working smooth for you in X11 window?

Link to post
Share on other sites

On 9/15/2021 at 10:39 PM, jernej said:

You'll wait for more than a month.

Thank you for sharing the schedule. But no hurry, my daily use cases are already working anyway. I'm only doing this for my own education and as a preview of maybe landing features in Mainline.


On 9/15/2021 at 10:50 PM, jock said:

Also on my tests drm-copy is consuming more cpu power.

For me, it's the other way around. hwdec=drm yield high CPU utilisation for bbb_sunflower_2160p_30fps_normal.mp4 and hwdec=drm-copy yield low CPU utilisation.
As the values are fluctuating heavyly it is difficult to provide absolut numbers.

But anyway here are some rough values:

gst-play-1.0                             ~35%
mpv --hwdec=none --hwdec-codecs=all      ~40%
mpv --hwdec=drm --hwdec-codecs=all       ~38%
mpv --hwdec=drm-copy --hwdec-codecs=all  ~28%

gst-play-1.0                             ~35%
mpv --hwdec=none --hwdec-codecs=all      ~98% playing slow, sound out of sync
mpv --hwdec=drm --hwdec-codecs=all       ~98% playing jerky, sound out of sync
mpv --hwdec=drm-copy --hwdec-codecs=all  ~34% playing slow, sound out of sync

This is on plasma desktop with wayland backend. Desktop CPU usage fluctuates around 8 to 15% without playing a video, so it's uncertain which CPU cycles are really associated with video playback. The utilization is always distributed almost evenly over all cores. Frequency scaling is not considered here.


On 9/15/2021 at 10:50 PM, jock said:

I didn't try gstreamer yet, but is it working smooth for you in X11 window?

For reference here the values for lxqt desktop with native xorg backend:

gst-play-1.0                             ~25%
mpv --hwdec=none --hwdec-codecs=all      ~30%
mpv --hwdec=drm --hwdec-codecs=all       ~28%
mpv --hwdec=drm-copy --hwdec-codecs=all  ~18%

gst-play-1.0                             ~18% paying as diashow
mpv --hwdec=none --hwdec-codecs=all      ~98% playing slow, sound out of sync
mpv --hwdec=drm --hwdec-codecs=all       ~98% playing jerky, sound out of sync
mpv --hwdec=drm-copy --hwdec-codecs=all  ~22% playing slow, sound out of sync

The VPU decoder is not the bottleneck, but setting up an efifcient video pipeline with proper interaction of the several involved hardware acellerators. For Xwindow with the modeset driver this seems not realy possible.

Link to post
Share on other sites

Working for me on kernel 5.14.5 with ffmpeg h264 1600*900 15fps stream from my cams use 17-22% of one CPU about 5% of which is pxfmt conversion because it reduces by that amount without conversion - no further kernel patches and ffmpeg dependencies all installed from default repos. This is awesome I've been wanting this for a while, my use case is Frigate NVR which is currently running on an old intel system with a nvidia gpu doing the decoding. I can now revert my nanopc t4 to the task and save some electricity. 

My ffmpeg command that emulates approximately what Frigate does:


ffmpeg  -loglevel warning -hwaccel drm -i rtsp://  -pix_fmt yuv420p  -f rawvideo pipe:


Many thanks @jernejkwiboo and everyone else who made this possible


Now need to find out why get block i/o  errors and corruption on emmc on all 5.13 and 5.14 kernels I have tried, doing testing from sdcard for now.

Link to post
Share on other sites

On 9/25/2021 at 4:46 PM, ATK said:

@ScottP... will definitely like to hear more from you on your NVR stuff... 

Hear is a lengthy reply I put on the Frigate NVR github for someone asking about hardware decoding for Frigate NVR. I would be interested if I am doing anything wrong here or I have missed a step.


TL;DR It does not work reliably for me  ATM but this is the closest to working I have seen so far. Work is ongoing in linux kernel and FFmpeg, it may work reliably sometime in the future. When the kernel drivers are moved out of staging and the interface to them is stable I expect to see a pull request on the main FFmpeg git. This is a long reply with information to test because I am giving up at this point and moving to a different platform. I would be interested if you find a solution though, or that I have missed something - hence the detailed reply.

For testing you can try this fork of ffmpeg It has v4l2-request and libdrm stateless VPU decoding built in using hantro and rockchip_vdec/rkvdec.
use kernel 5.14.9, armbian is a convenient way to change kernels - sudo armbian-config -> system -> Other kernels.  FFmpeg from the above github has private headers for kernel interfaces and they are updated about a month after each release. You must install the correct userspace kernel headers, I just get the kernel source from and then do `make -j6 headers_install INSTALL_HDR_PATH=/usr`
Do not use amrbian-config to install kernel headers - it installs the wrong version.

Then install FFmpeg dependencies:
`sudo apt install libudev-dev libfreetype-dev libmp3lame-dev libvorbis-dev libwebp-dev libx264-dev libx265-dev libssl-dev libdrm2 libdrm-dev pkg-config libfdk-aac-dev libopenjp2-7-dev`
Run configure, this is a minimal set of options, frigate includes many more though, I removed many of them to build faster and save memory (I actually think there are a lot of redundant ffmpeg components in frigates default build files, some X11 frame grabber stuff and codecs nobody uses anymore, but thats for a separate discussion):

./configure \
--enable-libdrm \
--enable-v4l2-request \
--enable-libudev \
--disable-debug \
--disable-doc \
--disable-ffplay \
--enable-shared \
--enable-libfreetype \
--enable-gpl \
--enable-libmp3lame \
--enable-libvorbis \
--enable-libwebp \
--enable-libx265 \
--enable-libx264 \
--enable-nonfree \
--enable-openssl \
--enable-libfdk_aac \
--enable-postproc \
--extra-libs=-ldl \
--prefix="${PREFIX}" \
--enable-libopenjpeg \
--extra-libs=-lpthread \

Then `make -j6`
I dont know if this next bit is correct, but it works for me, I dont want to do `make install` just run the ffmpeg tests from the build directory, to run tests you must run `sudo ldconfig $PWD $PWD/lib*` first otherwise linker will not find libraries.

If you want to try a different kernel version run `make distclean` in FFmpeg and run ./configure again. If FFmpeg fails to build it will be because private headers do not match kernel headers. errors like V4L... undefined etc

Then you can do some tests and see if you get valid output, for example, this decodes 15s from one of my cams:

`./ffmpeg -benchmark -loglevel debug -hwaccel drm  -i rtsp://  -t 15 -pix_fmt yuv420p -f rawvideo out.yuv`

Checks to make during and after decoding: 
Observe CPU usage, on my system rk3399 with 1.5Ghz little core and 2Ghz big core overclock I get between 17 and 25% cpu on one core, it varies if it runs on a53 little core or a72 big core. It should be better than that, I think its the way that the data is copied around in memory. Gstreamer or mpv attempt to do zero copy decoding so its more efficient. With software decoding CPU use is about 70% of one core. RK3328 does not have the two a72  cores and four a53 cores that RK3399 has, just four a53 cores so rk3328 about half as powerful as RK3399 as the a72 cores are about twice as powerful as the a53 cores.

You should see in the debug output for ffmpeg where it tries each of the /dev/video interfaces to find the correct codec for decoding. Be warned that ffmpeg will sometimes just fall back to software decode, if that happens you will see much higher CPU usage and often ffmpeg will spawn a number of threads to use all cores in your system. Your user should be a member of the "video" group in /etc/group to access without sudo. Log snippet of that section below:

```[h264 @ 0xaaab06cd9070] Format drm_prime chosen by get_format().
[h264 @ 0xaaab06cd9070] Format drm_prime requires hwaccel initialisation.
[h264 @ 0xaaab06cd9070] ff_v4l2_request_init: avctx=0xaaab06cd9070 hw_device_ctx=0xaaab06c549a0 hw_frames_ctx=(nil)
[h264 @ 0xaaab06cd9070] v4l2_request_probe_media_device: avctx=0xaaab06cd9070 ctx=0xffff8804df20 path=/dev/media1 driver=hantro-vpu
[h264 @ 0xaaab06cd9070] v4l2_request_probe_video_device: avctx=0xaaab06cd9070 ctx=0xffff8804df20 path=/dev/video1 capabilities=69222400
[h264 @ 0xaaab06cd9070] v4l2_request_try_format: pixelformat 875967059 not supported for type 10
[h264 @ 0xaaab06cd9070] v4l2_request_probe_video_device: try output format failed
[h264 @ 0xaaab06cd9070] v4l2_request_probe_video_device: avctx=0xaaab06cd9070 ctx=0xffff8804df20 path=/dev/video2 capabilities=69222400
[h264 @ 0xaaab06cd9070] v4l2_request_try_format: pixelformat 875967059 not supported for type 10
[h264 @ 0xaaab06cd9070] v4l2_request_probe_video_device: try output format failed
[h264 @ 0xaaab06cd9070] v4l2_request_probe_media_device: avctx=0xaaab06cd9070 ctx=0xffff8804df20 path=/dev/media0 driver=rkvdec
[h264 @ 0xaaab06cd9070] v4l2_request_probe_video_device: avctx=0xaaab06cd9070 ctx=0xffff8804df20 path=/dev/video0 capabilities=69222400
[h264 @ 0xaaab06cd9070] v4l2_request_init_context: pixelformat=842094158 width=1600 height=912 bytesperline=1600 sizeimage=2918400 num_planes=1
[h264 @ 0xaaab06cd9070] ff_v4l2_request_frame_params: avctx=0xaaab06cd9070 ctx=0xffff8804df20 hw_frames_ctx=0xffff8804faa0 hwfc=0xffff8804e530 pool=0xffff8805e910 width=1600 height=912 initial_pool_size=3

Check that the output file contains valid video data, try playing it using vlc:
`vlc  --rawvid-fps 10 --rawvid-width 1600 --rawvid-height 900 --rawvid-chroma I420 out.yuv`
adjust the command to what height/width/fps your cameras record in.

If all this is working then try doing longer decodes in parallel, eg is you have 3 cams run the ffmpeg command for each of them in a separate window and increase the time. What happens to me is that at some point ffmpeg will start reporting "resource not available/busy" or similar, rebooting will make it work for a while again. 

You can check what codecs are supported by each of the interfaces /dev/video[012] by `v4l2-ctl --all -d0` change d0 to d1 d2 etc to view the other decoders/encoders

You can monitor the state of kernel development  Most of the work on this is being done by Andrzej Pietrasiewicz. My suggestion is monitor  both the ffmpeg github and kernel commits/patches, find out when they rebase ffmpeg. Pull that version and install the current kernel for it plus headers and retest.

I have all the frigate docker files already created. I basically created a new set of  dockerfiles with an arch of aarch64rockchip and added those to Makefile. I'll upload them to my github at some point, I see little point to a pull request since rockchip is a niche platform with not many users in home assistant or frigate, and it does not currently work for me reliably anyway.

I have been trying to get this working for some time now, at kernel 5.4.* there were a bunch of kernel patches you had to apply. Nothing worked for me then. Often FFmpeg complained about the pixel format. There were some people on Armbian forums who claimed to have it working, but I had my doubts, maybe it was wishful thinking and ffmpeg was really using software decode. Most of the effort around this is for video playback so people can play 1080p and 2/4k videos on desktop and  kodi. There is little information about straight decoding to a pipe like frigate. So in research ignore stuff to do with patched libva etc.
For now I am using an old ~2013 i5-4670 four core/thread Haswell with Nvidia GT640 GPU for Frigate and Home Assistant. For three cams at 1600*900 10fps Frigate uses 6% CPU as reported by Home Assistant supervisor. It is very stable. With that in mind and wanting to use a more power efficient system I caved and ordered a Nvidia Jetson 4GB developer kit yesterday. I have confidence I can build Frigate docker containers for that system and it has a similar hardware decoder as their GPUs, I can also try out using CUDA filters and scaling to reduce CPU load for Frigate detector. A start would be to copy the amd64nvidia dockerfiles and create aarch64nvidia arch and modify from there it should be mostly the same.



Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


18 18