Understanding Hardware-Accelerated Video Decoding


Recommended Posts

I've taken peeks at threads related to hardware decoding (of H.264 and HEVC, mainly) on Allwinner and Rockchip platforms on and off, sometimes dabbled in trying and failing to implement solutions recommended there. Being a complete amateur, I find the topic very opaque and confusing, with various different components that need to interface with each other, be patched in sync, and sometimes change drastically between kernel versions, etc. Today I sat down and read up on these subjects, scouring wikis, documentation, this forum, and assorted other sources to try and understand how this works. In this post I will attempt to compile what I've learned on the different software components involved, their relationships, their current status, and solutions to the problem. I hope people more knowledgeable will correct me when I get something wrong or cite outdated information. Stuff which I am highly uncertain of I will print in italics.

(This post is going to focus on mainline implementations of Cedrus/Allwinner, I haven't looked into Hantro/Rkvdec/Rockchip specifics yet. I will speak only of H.264 and H.265/HEVC; I don't understand the high/low stuff and didn't pay attention to other codecs.)

 

Components:

 

Basics: Video codecs like H.264, H.265/HEVC, MPEG-2, etc. are standardised methods which serve to more efficiently encode and decode videos, reducing their filesize. Software en-/decoding is very CPU-intensive. Modern GPUS and ARM SoCs therefore contain specialised hardware (VPUs) to delegate these tasks to. Working hardware decoding is particularly important for underpowered ARM CPUs.

Drivers: Topmost in the stack are the VPU drivers. These are Sunxi-Cedrus/Cedrus V4L2 M2M and Cedar [Is this the legacy one?] on Allwinner; Hantro and Rkvdec on Rockchip. These are all still in development, but Cedrus already fully supports H.264, and partially supports HEVC, and is already usable in the mainline kernel.

APIs: In order for anything (userspace APIs, libraries) to make use of the VPU drivers, you need backends/APIs. For Cedrus, there is the unmaintained libva-v4l2-request backend which implements VA-API, the legacy VDPAU implementation libvdpau-sunxi, and as of kernel version 5.11, H.264 has been merged into the uAPI headers. Different applications may make recourse to one or another of these APIs.

Libraries: FFmpeg and GStreamer. provide libraries and APIs of their own to other applications but can (importantly!) also output directly to the framebuffer. FFmpeg must be patched to access either libva-v4l2-request or the Cedrus driver headers. GStreamer directly accesses kernel headers since 1.18 (works on 5.9, not on 5.10; 1.20 will support 5.11.)

Media players: mpv and depends on FFmpeg for hardware acceleration (and must be patched together with it). VLC can be set to access libva-v4l2-request directly. Kodi 19.0+ supports hardware acceleration out of the box without any out-of-tree patches.

Display server: An additional complication is drawing the output of any of the above on screen. Most successful implementations I've seen bypass X11 and either output directly to the framebuffer or force a plane/display layer on top of any X windows. Wayland apparently makes this easier by allowing applications to use their own DRM planes, but this hasn't been explored much yet. Kodi 19.0 works with all three windowing systems (X, Wayland, and gbm).

 

Codec status:

6 hours ago, jernej said:

H264, MPEG2 and VP8 should be good in mainline, although api can still change until codec is promoted to uAPI. HEVC still needs out-of-tree patches for any serious work.

 

Taken from the LibreElec thread (which reflects LibreElec's status and is ahead of what works elsewhere, but outlines hardware limitations):

 

Quote

 

only MPEG2, H264 (AVC), H265 (HEVC) and VP8 codecs are supported in hardware, for now. Others are software decoded.

- R40 doesn't support H265 (HEVC) - hardware limitation

- 10-bit H265 videos are supported only on H6 (H3, H5 and A64 don't support 10-bit - hardware limitation)

- 10-bit H264 is not supported (hardware limitation)

 

 

Approaches:

 

Many people have managed to make it work on their machines using different approaches. Note that some of these solutions are one or two years old, and kernel developments since may have changed the situation. Ordered from newer to older:

 

LibreElec – kernel + ffmpeg + Kodi: LibreElec is a Just-Enough-OS with the sole purpose of running Kodi, a media player. It's at the bleeding edge and usually implements codecs and features well before mainline or other distros. It achieves this by heavily patching everything up and down the stack, from the Linux kernel over FFmpeg to Kodi itself. These patches could all be applied to an Armbian build, but there are a lot of them, they're poorly documented, and you'd need to dig into their github to understand what they all do. LibreElec runs Kodi directly without a desktop. kodi-gbm is a package that can be installed on Armbian and functions similarly.
Key contributors to the project are @jernej and @Kwiboo, who sometimes post about their work here (and have been very helpful with questions, thank you). @balbes150 includes some of LibreElec's patches in his Armbian-TV builds, but I don't know which.

 

Kodi 19.0: 

6 hours ago, jernej said:

Further clarification: Kodi 19.0 (released recently) is highly recommended for all this - it doesn't require any out of tree patch for video decoding (LE uses patch for HW deinterlacing). Additionally, with version 19.0, there is single binary for all 3 windowing systems (gbm, X11, wayland). Depends on build options. Not sure if this version is available on Armbian but PPA exists, so I guess it should not be hard to test.

 

LibreElec patches + mpv:

On 8/22/2019 at 8:14 PM, jernej said:

@Alerino Reis If you're using ffmpeg patches from LibreELEC, then you need only this additional patch to make it compatible with mpv. I tested yesterday and it works for me when running without any window manager running with either of these commands:






mpv --vo=gpu --gpu-context=drm --hwdec=auto video.mkv
mpv --vo=drm --hwdec=auto video.mkv

You can append "-v" parameter to check if mpv really uses HW decoding.

 

 

@megous Kernel 5.11 + GStreamer: This implementation, done here on a PinePhone (A64), patches the 5.9 kernel and uses a recent version of GStreamer (1.18 and up), whose output is rendered directly to a DRM plane via kmssink. (No X or Wayland.)

GStreamer 1.18 works with the 5.9 kernel. It does not work with 5.10, because of numerous changes to the kernel headers in this version. In 5.11 the H.264 headers were moved into the uAPI; the master branch of GStreamer reflects this, but there haven't been any releases with these patches yet. It'll probably be in repositories with GStreamer 1.20; until then you can build it from source.

 

@Sash0k – patched libva-v4l2-request + VLC: This updates bootlin's libva-v4l2-request and follows the Sunxi wiki's instructions for enabling VLC to make use of it. It works on the desktop. This only works for H.264 and breaks HEVC. When I tried to replicate this approach on a recent Armbian build, I discovered that the h264.c files in the kernel (that libva-v4l2-request draws on) have changed considerably between 5.8 and 5.10, and I lack the understanding to reconcile libva-v4l2-request with them.

On 7/20/2020 at 4:38 PM, Sash0k said:

Finally got it! No kernel modifications needed, only v4l2-request.


Key notes:

  • Use bootlin code, latest master (not release-2019.03 tag)
  • I merged just one small patch from https://github.com/bootlin/libva-v4l2-request/pull/30/files (seems, it's unecessary)
  • Download kernel sources with corresponding version. For my armbian is: https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.4.45.tar.xz
  • Extract 2 files from kernel/include/media mpeg2-ctrls.h and h264-ctrls.h and replace ones in v4l2-request
  • Replace V4L2_PIX_FMT_H264_SLICE_RAW to V4L2_PIX_FMT_H264_SLICE in v4l2-request source code
  • Compile and install (instruction is as 2 posts above)
  • Don't forget to set VLC as in https://linux-sunxi.org/Sunxi-Cedrus
      Reveal hidden contents

    Tools > Preferences > Input / Codecs > Codecs > Hardware-accelerated decoding > VA-API video decoder

    Tools > Preferences > Video > Display > Output > X11 video output (XCB)

 

Tested with VLC, usable with issues:

  • Artifacts in some videos h264 720p and higher, for example: https://imgur.com/nYFArT4 (360/480 works fine)
  • Scaling (fullscreen, resizing) not works, slowdown with message
    
    
    
    
    
    [a310cb88] main filter error: Failed to create video converter
  • Minor issues in console output on playback (see bold)
      Reveal hidden contents

    $ vlc 3-big_buck_bunny_480p_H264_AAC_25fps_1800K.MP4
    VLC media player 3.0.9.2 Vetinari (revision 3.0.9.2-0-gd4c1aefe4d)
    [02287b98] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.
    libEGL warning: DRI2: failed to authenticate
    libva info: VA-API version 1.7.0
    libva info: User environment variable requested driver 'v4l2_request'
    libva info: Trying to open /usr/lib/arm-linux-gnueabihf/dri/v4l2_request_drv_video.so
    libva info: Found init function __vaDriverInit_1_7
    libva info: va_openDriver() returns 0
    [a272d350] avcodec decoder: Using v4l2-request for hardware decoding
    [a3018b98] blend blend error: no matching alpha blending routine (chroma: YUVA -> VAOP)
    [a3018b98] main blend error: blending YUVA to VAOP failed

 

Thanks to: @jernej for this post:

 

 

@ubobrov – old kernel + libcedrus + libvdpau-sunxi + ffmpeg + mpv: This approach, which supports encoding decoding of H.264 uses the libvdpau-sunxi API and ports the legacy driver to mainline as a loadable kernel module and if I understand it correctly, ubobrov ported a legacy feature to mainline. In the post quoted below the kernel is 4.20, but the same method has been successfully applied to 5.7.8 by another user. It requires that the board's device tree be patched, as documented in ubobrov's github repository.

On 4/23/2020 at 4:41 PM, ubobrov said:

Decoding H264 and X11 rendering using vdpau_sunxi, libcedrus, kernel 4.20.17, mpv, vncserver and Armbian Bionic on Orange PI Zero

libvdpau: https://github.com/uboborov/libvdpau-sunxi-H3.git

libcedrus: https://github.com/uboborov/libcedrus-H3.git

cedar_dev: https://github.com/uboborov/sunxi-cedar-mainline.git

 

mpv, ffmpeg, x11 installed using apt

 

It works extremely slow but it's just a beginning  )

video: https://www.youtube.com/watch?v=9O7L_kaEDdk

UPD:

video on Orange PI One 1280x720 HDMI (works pretty fine)

https://www.youtube.com/watch?v=8qPyOG-yJIw

 

 

 

The summary seems to be that none of the current implementations on Allwinner boards really play nice with X or desktop sessions, and it's best to output directly to the framebuffer. Kwiboo has forked FFmpeg and mpv to make good use of new and unstable kernel features/hardware acceleration which will take a while to make their way upstream. The recent 5.11 move of stateless H.264 out of staging and into the uAPI should facilitate further developments.

I intend to try some of these things in the nearer future. Thanks to everyone who works on mainlining all of this VPU stuff, and to users here who contribute solutions and readily & patiently answer questions (Jernej especially). I hope I didn't post any falsehoods out of ignorance, and welcome any corrections.

 

Other related threads here:
https://forum.armbian.com/topic/11551-4kp30-video-on-orange-pi-lite-and-mainline-hardware-acceleration/

https://forum.armbian.com/topic/16804-orange-pi-pc-h3-armbian-focal-5104-sunxi-av-tv-out-cvbs-enable/page/2/

https://forum.armbian.com/topic/11184-hardware-graphicvideo-acceleration-in-h3-mainline/

https://forum.armbian.com/topic/13622-mainline-vpu/

Edited by P.P.A.
Edits, corrections and additions to reflect Jernej's and ubobrov's input below.
Link to post
Share on other sites

  • P.P.A. changed the title to Understanding Hardware-Accelerated Video Decoding
Armbian is a community driven open source project. Do you like to contribute your code?

Good summary, let me clear some things.

18 hours ago, P.P.A. said:

as of kernel version 5.11, H.264 has been merged into the uAPI headers which, as I understand it, obsoletes libva-v4l2-request.

Having proper uAPI by no means makes libva-v4l2-request obsolete. If this lib is updated to latest uAPI, it still could serve as intermediate layer for all apps that don't support new interface but they support VA-API.

18 hours ago, P.P.A. said:

There are also VA-API and VDPAU, which the libraries below usually interface with. As far as I can tell, despite their frequent mention and similarly named things, these are irrelevant to present implementations of ARM hardware acceleration.

Before you talked about libva-v4l2-request, which implements VA-API, so I wouldn't say it's irrelevant to ARM HW accel. libvdpau-sunxi implements VDPAU, but that works only on BSP kernels and it is not relevant for mainline.

18 hours ago, P.P.A. said:

kodi-gbm is a package that can be onstalled on Armbian and functions similarly.

Further clarification: Kodi 19.0 (released recently) is highly recommended for all this - it doesn't require any out of tree patch for video decoding (LE uses patch for HW deinterlacing). Additionally, with version 19.0, there is single binary for all 3 windowing systems (gbm, X11, wayland). Depends on build options. Not sure if this version is available on Armbian but PPA exists, so I guess it should not be hard to test.

18 hours ago, P.P.A. said:

Recent mainlining efforts may mean that patching the kernel is no longer necessary and it works with 5.11 out of the box.

H264, MPEG2 and VP8 should be good in mainline, although api can still change until codec is promoted to uAPI. HEVC still needs out-of-tree patches for any serious work.

 

Maybe you can update your text, so we have current state overview in single post.

 

Link to post
Share on other sites

Thanks for your reply.

 

3 hours ago, jernej said:

Maybe you can update your text, so we have current state overview in single post.

 

That's the idea. I've edited the post, corrected erroneous statements and added your information. (And added some of the information on GStreamer from IRC.)

Link to post
Share on other sites

22 hours ago, P.P.A. said:

his approach, which supports encoding of H.264 uses the libvdpau-sunxi API

This is a bit incorrect. libvdpau-sunxi is using for decoding only.

22 hours ago, P.P.A. said:

if I understand it correctly, ubobrov ported a legacy feature to mainline

The Cedrus legacy driver has been supported to run on mainline kernel as KLM.

Link to post
Share on other sites

Posted (edited)

Thanks to the very patient support from jernej and ndufresne on the linux-sunxi IRC channel, I could confirm that GStreamer 1.19+ works out of the box on the 5.11 kernel (sunxi-dev), tested with Hirsute and Bullseye, at least on the H3/Orange Pi PC. (I haven't been able to build or run a 5.11 image on a H6 device, so I couldn't test it there yet.)

 

Kernel:

Build 5.11.y with this patch, must be included as a userpatch. 5.11.6 includes it from the getgo, but below that you need to add it yourself:

<jernej> PPA: kernel 5.11.6 is released with Cedrus patch

 

Requirements:

sudo apt update
sudo apt install meson ninja-build pkg-config libmount-dev libglib2.0-dev libgudev-1.0-dev libxml2-dev libasound2-dev

 

Building & installing GStreamer:

git clone https://gitlab.freedesktop.org/gstreamer/gst-build.git
cd gst-build
meson -Dgst-plugins-bad:v4l2codecs=enabled -Dgst-plugins-base:playback=enabled -Dgst-plugins-base:alsa=enabled build
ninja -C build
cd build
sudo meson install
sudo /sbin/ldconfig -v

 

You can play videos from the command line to the framebuffer. At least on the OPiPC there is a problem where it doesn't automatically play on the correct DRM layer and the video is hidden. To fix this, it needs to be ran once per boot with “plane-properties=s,zpos=3”:

 

gst-play-1.0 --use-playbin3 --videosink="kmssink plane-properties=s,zpos=3" video.file

 

Afterwards it should be fine with just gst-play-1.0 --use-playbin3 input.video (until the next reboot).

h.264 (except for 10-bit, which the hardware cannot handle) decodes smoothly with CPU load across all cores around 2%–5%.

 

On 3/4/2021 at 9:42 PM, ubobrov said:

This is a bit incorrect. libvdpau-sunxi is using for decoding only.

The Cedrus legacy driver has been supported to run on mainline kernel as KLM.

 

Thanks; the OP was edited accordingly.

Edited by P.P.A.
Patch added to 5.11.6
Link to post
Share on other sites

  • lanefu pinned and featured this topic

Fantastic topic, thanks.

As I get

meson.build:1:0: ERROR:  Meson version is 0.49.2 but project requires >= 0.54.0.

I guess I should switch to 5.11 kernel (sunxi-dev). I tried other kernels, switched to nightly builds, but I'm still in 5.10.21-sunxi.
If only someone could do the same for video encoding. :unsure:

Link to post
Share on other sites

On 3/10/2021 at 10:36 AM, gounthar said:

Fantastic topic, thanks.

As I get


meson.build:1:0: ERROR:  Meson version is 0.49.2 but project requires >= 0.54.0.

I guess I should switch to 5.11 kernel (sunxi-dev). I tried other kernels, switched to nightly builds, but I'm still in 5.10.21-sunxi.
If only someone could do the same for video encoding. :unsure:

It must be 5.11, so you need to build from source.

Something I forgot to mention (going to edit the post now): the kernel must be built with this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7072db89572135f28cad65f15877bf7e67cf2ff8

It's been accepted for upstreaming and will be included in 5.11 later, but up until 5.11.3 at least, you need to include it as a userpatch.

Link to post
Share on other sites

On 3/9/2021 at 9:18 PM, P.P.A. said:

At least on the OPiPC there is a problem where it doesn't automatically play on the correct DRM layer and the video is hidden.

This happens on all SoCs with DE2 or newer (A83t or newer SoC). Most SoCs have only one capable plane which can display YUV formats and it's always below current framebuffer, so that "workaround" (which imo is not workaround, but just part of configuration) is always needed in your use case. Note that having video plane below UI plane is actually desired for video players - UI plane has alpha channel which makes window with video transparent.

Link to post
Share on other sites

On 3/11/2021 at 12:11 PM, P.P.A. said:

It must be 5.11, so you need to build from source.

Something I forgot to mention (going to edit the post now): the kernel must be built with this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7072db89572135f28cad65f15877bf7e67cf2ff8

It's been accepted for upstreaming and will be included in 5.11 later, but up until 5.11.3 at least, you need to include it as a userpatch.

I don't want to hijack this thread, but could someone point me to a thread or documentation on how to rebuild and install the kernel on the machine itself?

I tried linux-5.11.6 and it looks like the patch is already there:

 

cp /proc/config.gz linux-5.11.6/ && cd linux-5.11.6/ && gunzip config.gz
curl -sSL https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=7072db89572135f28cad65f15877bf7e67cf2ff8 >cedrus.patch
patch -p1 < cedrus.patch
patching file drivers/staging/media/sunxi/cedrus/cedrus.c
Reversed (or previously applied) patch detected!  Assume -R? [n] y
patching file drivers/staging/media/sunxi/cedrus/cedrus.h
Reversed (or previously applied) patch detected!  Assume -R? [n] y

Thanks.

Link to post
Share on other sites

22 hours ago, gounthar said:

I don't want to hijack this thread, but could someone point me to a thread or documentation on how to rebuild and install the kernel on the machine itself?

I tried linux-5.11.6 and it looks like the patch is already there:

 

Thanks.

The patch was very recently added in 5.11.6.

I'm not sure how you install the kernel after the fact—I simply built a new bootable image with armbian-build (choosing sunxi-dev as the kernel with ./compile.sh EXPERT="yes") and flashed that to µSD.

Link to post
Share on other sites

Dear All, do we have cedrus h264 decode in kernel 5.12.x ? i also applied recent patch

it tested on allwinner a20 but get following error:

0:00:00.427616833   429   0x491400 ERROR     v4l2codecs-decoder gstv4l2decoder.c:697:gst_v4l2_decoder_get_controls:<v4l2decoder1> VIDIOC_G_EXT_CTRLS failed: Invalid argument
0:00:00.427836417   429   0x491400 WARN      v4l2codecs-h264dec gstv4l2codech264dec.c:134:gst_v4l2_codec_h264_dec_open:<v4l2slh264dec0> error: Driver did not report framing and start code method.
0:00:00.427968208   429   0x491400 WARN      v4l2codecs-h264dec gstv4l2codech264dec.c:134:gst_v4l2_codec_h264_dec_open:<v4l2slh264dec0> error: gst_v4l2_decoder_get_controls() failed: Invalid argument
0:00:00.428132625   429   0x491400 INFO        GST_ERROR_SYSTEM gstelement.c:2234:gst_element_message_full_with_details:<v4l2slh264dec0> posting message: Driver did not report framing and start code method.
ERROR: from element /GstPipeline:pipeline0/v4l2slh264dec:v4l2slh264dec0: Driver did not report framing and start code method.
Additional debug info:
../gst-plugins-bad-1.18.4/sys/v4l2codecs/gstv4l2codech264dec.c(134): gst_v4l2_codec_h264_dec_open (): /GstPipeline:pipeline0/v4l2slh264dec:v4l2slh264dec0:
gst_v4l2_decoder_get_controls() failed: Invalid argument
ERROR: pipeline doesn't want to preroll.

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

4 4