I've taken peeks at threads related to hardware decoding (of H.264 and HEVC, mainly) on Allwinner and Rockchip platforms on and off, sometimes dabbled in trying and failing to implement solutions recommended there. Being a complete amateur, I find the topic very opaque and confusing, with various different components that need to interface with each other, be patched in sync, and sometimes change drastically between kernel versions, etc. Today I sat down and read up on these subjects, scouring wikis, documentation, this forum, and assorted other sources to try and understand how this works. In this post I will attempt to compile what I've learned on the different software components involved, their relationships, their current status, and solutions to the problem. I hope people more knowledgeable will correct me when I get something wrong or cite outdated information. Stuff which I am highly uncertain of I will print in italics.
(This post is going to focus on mainline implementations of Cedrus/Allwinner, I haven't looked into Hantro/Rkvdec/Rockchip specifics yet. I will speak only of H.264 and H.265/HEVC; I don't understand the high/low stuff and didn't pay attention to other codecs.)
Components:
Basics: Video codecs like H.264, H.265/HEVC, MPEG-2, etc. are standardised methods which serve to more efficiently encode and decode videos, reducing their filesize. Software en-/decoding is very CPU-intensive. Modern GPUS and ARM SoCs therefore contain specialised hardware (VPUs) to delegate these tasks to. Working hardware decoding is particularly important for underpowered ARM CPUs.
Drivers: Topmost in the stack are the VPU drivers. These are Sunxi-Cedrus/Cedrus V4L2 M2M and Cedar [Is this the legacy one?] on Allwinner; Hantro and Rkvdec on Rockchip. These are all still in development, but Cedrus already fully supports H.264, and partially supports HEVC, and is already usable in the mainline kernel.
APIs: In order for anything (userspace APIs, libraries) to make use of the VPU drivers, you need backends/APIs. For Cedrus, there is the unmaintained libva-v4l2-request backend which implements VA-API, the legacy VDPAU implementation libvdpau-sunxi, and as of kernel version 5.11, H.264 has been merged into the uAPI headers. Different applications may make recourse to one or another of these APIs.
Libraries: FFmpeg and GStreamer. provide libraries and APIs of their own to other applications but can (importantly!) also output directly to the framebuffer. FFmpeg must be patched to access either libva-v4l2-request or the Cedrus driver headers. GStreamer directly accesses kernel headers since 1.18 (works on 5.9, not on 5.10; 1.20 will support 5.11.)
Media players: mpv and depends on FFmpeg for hardware acceleration (and must be patched together with it). VLC can be set to access libva-v4l2-request directly. Kodi 19.0+ supports hardware acceleration out of the box without any out-of-tree patches.
Display server: An additional complication is drawing the output of any of the above on screen. Most successful implementations I've seen bypass X11 and either output directly to the framebuffer or force a plane/display layer on top of any X windows. Wayland apparently makes this easier by allowing applications to use their own DRM planes, but this hasn't been explored much yet. Kodi 19.0 works with all three windowing systems (X, Wayland, and gbm).
Codec status:
Taken from the LibreElec thread (which reflects LibreElec's status and is ahead of what works elsewhere, but outlines hardware limitations):
Approaches:
Many people have managed to make it work on their machines using different approaches. Note that some of these solutions are one or two years old, and kernel developments since may have changed the situation. Ordered from newer to older:
LibreElec – kernel + ffmpeg + Kodi: LibreElec is a Just-Enough-OS with the sole purpose of running Kodi, a media player. It's at the bleeding edge and usually implements codecs and features well before mainline or other distros. It achieves this by heavily patching everything up and down the stack, from the Linux kernel over FFmpeg to Kodi itself. These patches could all be applied to an Armbian build, but there are a lot of them, they're poorly documented, and you'd need to dig into their github to understand what they all do. LibreElec runs Kodi directly without a desktop. kodi-gbm is a package that can be installed on Armbian and functions similarly.
Key contributors to the project are @jernej and @Kwiboo, who sometimes post about their work here (and have been very helpful with questions, thank you). @balbes150 includes some of LibreElec's patches in his Armbian-TV builds, but I don't know which.
Kodi 19.0:
LibreElec patches + mpv:
@megous – Kernel 5.11 + GStreamer: This implementation, done here on a PinePhone (A64), patches the 5.9 kernel and uses a recent version of GStreamer (1.18 and up), whose output is rendered directly to a DRM plane via kmssink. (No X or Wayland.)
GStreamer 1.18 works with the 5.9 kernel. It does not work with 5.10, because of numerous changes to the kernel headers in this version. In 5.11 the H.264 headers were moved into the uAPI; the master branch of GStreamer reflects this, but there haven't been any releases with these patches yet. It'll probably be in repositories with GStreamer 1.20; until then you can build it from source.
@Sash0k – patched libva-v4l2-request + VLC: This updates bootlin's libva-v4l2-request and follows the Sunxi wiki's instructions for enabling VLC to make use of it. It works on the desktop. This only works for H.264 and breaks HEVC. When I tried to replicate this approach on a recent Armbian build, I discovered that the h264.c files in the kernel (that libva-v4l2-request draws on) have changed considerably between 5.8 and 5.10, and I lack the understanding to reconcile libva-v4l2-request with them.
@ubobrov – old kernel + libcedrus + libvdpau-sunxi + ffmpeg + mpv: This approach, which supports encoding decoding of H.264 uses the libvdpau-sunxi API and ports the legacy driver to mainline as a loadable kernel module and if I understand it correctly, ubobrov ported a legacy feature to mainline. In the post quoted below the kernel is 4.20, but the same method has been successfully applied to 5.7.8 by another user. It requires that the board's device tree be patched, as documented in ubobrov's github repository.
The summary seems to be that none of the current implementations on Allwinner boards really play nice with X or desktop sessions, and it's best to output directly to the framebuffer. Kwiboo has forked FFmpeg and mpv to make good use of new and unstable kernel features/hardware acceleration which will take a while to make their way upstream. The recent 5.11 move of stateless H.264 out of staging and into the uAPI should facilitate further developments.
I intend to try some of these things in the nearer future. Thanks to everyone who works on mainlining all of this VPU stuff, and to users here who contribute solutions and readily & patiently answer questions (Jernej especially). I hope I didn't post any falsehoods out of ignorance, and welcome any corrections.
Other related threads here:
https://forum.armbian.com/topic/11551-4kp30-video-on-orange-pi-lite-and-mainline-hardware-acceleration/
https://forum.armbian.com/topic/16804-orange-pi-pc-h3-armbian-focal-5104-sunxi-av-tv-out-cvbs-enable/page/2/
https://forum.armbian.com/topic/11184-hardware-graphicvideo-acceleration-in-h3-mainline/
https://forum.armbian.com/topic/13622-mainline-vpu/