HW accelerated video decoding/encoding on BPI M?


divis1969
 Share

1 1

Recommended Posts

Donate and support the project!

Actually I do not need a desktop.

It looks like this acceleration somehow depends on X11.

Does it mean it can only be used for HW-accelerated graphics (i.e. rendering on screen)?

I need this functionality for surveillance system (ex. kerberos.io) and do not need to draw it on screen.

Is it possible to install some packages for X11 to use acceleration?

Link to post
Share on other sites

Hardware acceleration works with legacy kernel via libvdpau-sunxi. And this depends on X11. You do not need a complete desktop/window manager environment, just X.

There are other ways to use cedrus code directly without VDPAU API to decode/encode sth. Code snippets should lay somewhere around the net, but i don't really know more about them.

Regards rellla

Link to post
Share on other sites

I've tried to install some of the packages (xorg, libvdpau-sunxi1, etc) but was unable to make vdpauinfo work.

Here is the log

$ vdpauinfo
debug1: client_input_channel_open: ctype x11 rchan 3 win 65536 max 16384
debug1: client_request_x11: request from ::1 42042
debug1: channel 1: new [x11]
debug1: confirm x11
display: localhost:10.0   screen: 0
debug1: client_input_channel_open: ctype x11 rchan 4 win 65536 max 16384
debug1: client_request_x11: request from ::1 42043
debug1: channel 2: new [x11]
debug1: confirm x11
debug1: channel 1: FORCE input drain
debug1: channel 2: FORCE input drain
Error creating VDPAU device: 25
debug1: channel 1: free: x11, nchannels 3
debug1: channel 2: free: x11, nchannels 2

Note that I was able to run xtrerm.

I'm running ubuntu server, kernel 3.4.113

 

Did I miss something?

 

Link to post
Share on other sites

It seems I need to enable a cedar_dev at least.

I've found the following in the kernel log:

[    1.915496] [cedar dev]: not installed! ve_mem_reserve=0

The questions are

1. What should I do to enable it

2. how to rebuild the kernel and replace it on the running machine or SD. I'm using the image built by myself. It was configured as Ubuntu server, kernel 3.4.

Link to post
Share on other sites

The kernel module is not loaded, because noone assignes memory to the device, because the boot parameter sunxi_ve_mem_reserve is set to 0 and the kernel does not support CMA.

 

1. Either build the kernel with CMA enabled and set to e.g.128MB or enable sunxi_ve_mem_reserve in boot.cmd https://github.com/igorpecovnik/lib/blob/master/config/bootscripts/boot-sunxi.cmd#L40

I don't know, how you can easily set this with build tools.

2. If you build your image yourself, you can add userpatches and/or do a kernel config during build process, where you can enable CMA.

 

Regards

rellla

Link to post
Share on other sites

Yeah, I'll try.

 

BTW, I've tried to build the desktop to figure out how this HW acceleration is enabled.

But did not find it. At least kernel config seems have the CMA disabled.

 

Is there any receipt how to replace the kernel on the system? I do not want to use a fresh install...

Link to post
Share on other sites

1 hour ago, divis1969 said:

Is there any receipt how to replace the kernel on the system? I do not want to use a fresh install...

Set https://github.com/igorpecovnik/lib/blob/master/compile.sh#L16
and the next line to yes, and your build system will give you deb packages, you can easily install on your system via dpkg -i *.deb

You will also be able to change the kernel config then.

Regards rellla

Link to post
Share on other sites

Thanks everyone!

I had modified the 'disp_mem_reserves' and now I can run vdpauinfo.

Unfortunately, there are some issues with vdpau usage.

 

1. vdpauinfo only works if I use 'ssh -X' to connect bananapi. Otherwise it fails because it cannot connect X server.

I'm suspecting I will not be able to use vdpau from the process running as a system service.

What can I do to fix this?

 

2. Test with ffmpeg to add a timestamp sting over the RTP stream from camera and write it to the a file is failing.

ffmpeg starts writing to a file but stucks at some point. After the termination with Ctrl-C (which is not so easy actually), the file is not looking like a valid mp4 (and is too short, about few KB)

Is there any test case (maybe with the mp4 file as an input) which I can use to check vdpau functionality and performance?

The RTP stream is 25 FPS, so perhaps I need to reduce the frame rate along with adding a time stamp to make it working...

Is there any way to enable/collect some logging to debug the issue with ffmpeg?

Link to post
Share on other sites

12 hours ago, divis1969 said:

1. vdpauinfo only works if I use 'ssh -X' to connect bananapi. Otherwise it fails because it cannot connect X server.

I'm suspecting I will not be able to use vdpau from the process running as a system service.

What can I do to fix this?

For ssh i am using - export DISPLAY=:0 .

For ffmpeg i think you have to compile it with vdpau support. Or check this treads:

FFmpeg with Cedrus H264 HW Encoder (H3 - CMOS Camera)

ffmpeg H264 encoding with cedrus

Link to post
Share on other sites

Thanks for the links.

I've built the ffmpeg from https://github.com/stulluk/FFmpeg-Cedrus and performed few tests to re-encode the mpeg file http://samplemedia.linaro.org/H264/big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 to reduce the frame rate to 5 FPS

 

1. Encoding with cedrus works pretty good, ffmpeg consumes around 120% CPU (my estimation of average, I've just run top at the same time).

Test took: real 1m10.127s, user 1m8.180s, sys 0m5.530s. FFMpeg showed ~ 5.7-6.2 FPS at the time on encoding.

Command line

./ffmpeg -i big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 -pix_fmt nv12 -r 5 -an -b:v 64k -c:v cedrus264 stream.mp4

2. Decoding with vdpau is something horrible. CPU (average) 90%, time real 9m44.446s, user 9m12.470s, sys 0m14.860s.

FFMpeg shows 0.8 FPS while processing.

./ffmpeg -hwaccel vdpau -i big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 -r 5 -an stream.mp4

In both these case the video is viewable.

 

3. I've tried to use both vdpau and cedrus codec. CPU is only 10% ! BUT!! time real 7m6.911s, user 0m28.900s, sys 0m13.050s, processing FPS is 0.9.

And video is completely unviewable...

FFmpeg logged at the end: Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used).

File length is 419 bytes.

./ffmpeg -hwaccel vdpau -i big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 -pix_fmt nv12 -r 5 -an -b:v 64k -c:v cedrus264 stream.mp4

What might be issue with vdpau?

Is is possible to use both vdpau (to decode) and cedrus encoder simultaneously?

 

Link to post
Share on other sites

21 hours ago, divis1969 said:

Thanks for the links.

I've built the ffmpeg from https://github.com/stulluk/FFmpeg-Cedrus and performed few tests to re-encode the mpeg file http://samplemedia.linaro.org/H264/big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 to reduce the frame rate to 5 FPS

 

Is is possible to use both vdpau (to decode) and cedrus encoder simultaneously?

 

You have been able to built this source, i remember i failed when i tried back then. Maybe it's fixed in the mean time and i have to try again.

I am not that qualified to answer your questions, but about vdpau, i don't think it's about encoding, but more about decoding. Cedrus is for encoding. And i am not sure about the reason, but i don't think it will be possible to use sunxi-vdpau for decoding directly through ffmpeg. I write this because i remember i tried and had 100% CPU usage, meaning - no h/w acceleration. But you can use vdpau for decoding through some players like mpv.

As for simultaneously use of encoding and decoding, this is something i would like to know too :).

Link to post
Share on other sites

@RagnerBG: Totally right. VDPAU is an API for decoding and presentation. It's used by several players. Encoding is not included in this API.

"Cedrus" is the project name of the reverse engineering effort in general. libvdpau-sunxi is based on cedrus code. ffmpeg-cedrus is based on cedrus code. As i can see in the readme, ffmpeg-cedrus only does hardware accelerated encoding.

I have no clue, how the libvdpau-sunxi backend works together with ffmpeg in case of decoding with --hwaccel vdpau option - but imho it should work with some adaptions. You may check https://github.com/linux-sunxi/libvdpau-sunxi/issues/55 and give some more log to find your issue...

rellla

 

Link to post
Share on other sites

It looks like libvdpau-suxi and ffmeg-cedrus are actually compete for VE. First one uses libcedrus to access VE, second uses the code compiled in directly. Most likely each one affects another.

So, there should be some rework needed to allow these pieces of code to co-exist.

Not sure about VE (cedrus) kernel driver, does it allow two clients or not.

 

BTW, I'm still not sure whether vdpau is actually improving the decoding (see test #2). I suppose encoding back to mpeg hides the effect. Is there a test I can use to  verify it (ex. to just drop the decoded frames)?

Link to post
Share on other sites

I have reworked ffmpeg's cedrus264 encoder to use libcedrus and modified libcedrus to allow few clients to use VE (in the same process).

How it is possible to use both vdpau-sunxi decoder and cedrus264 encoder to transcode the video.

The results for

./ffmpeg -hwaccel vdpau -i big_buck_bunny_720p_H264_AAC_25fps_3400K.MP4 -pix_fmt nv12 -r 5 -an -b:v 64k -c:v cedrus264 stream.mp4

are the following:

Video is viewable, CPU usage is ~80-90%, FPS while encoding ~6.7, time real 0m56.523s, user 0m28.210s, sys 0m15.720s

It is not as good as I was expecting though. Perhaps, copying data in memory is a bottleneck.

Not sure it could be improved

 

The code is located at https://github.com/divis1969/libcedrus (branch master) and https://github.com/divis1969/FFmpeg (branch 2.8-cedrus)

Link to post
Share on other sites

I've tired to do some profiling and debugging and had found that ffmpeg performs some conversions from yuv420p pixel format to nv12 format. These conversions are consuming a lot of time because are performed in software.

Note that cedrus_vdpau decoder (vdpau-sunxi) is doing one more conversion from the internal decoder format to yuv420p (on mine A20 which is using VE engine version 1623. On a newer chipsets with version 1680, it seems can directly decode into yuv420p).

 

I've found that ffmpeg_vdpau.c always selects the yuv420p as the decoder output format (see https://github.com/divis1969/FFmpeg/blob/2.8-cedrus/ffmpeg_vdpau.c#L191https://github.com/divis1969/FFmpeg/blob/2.8-cedrus/ffmpeg_vdpau.c#L265 and the code of sunxi-vdpau).

I did not find yet a way to figure out that user has specified the pixel format (-pix_fmt) at this point to make this code configurable and thus I've tried to just swap lines 192 and 193 (to select nv12 first) and recompiled the ffmpeg.

This increased the encoding FPS (~2 times, up to 14 FPS):

CPU usage is ~75-85%, FPS while encoding ~14, time real 0m28.137s, user 0m15.670s, sys 0m8.830s

Link to post
Share on other sites

Guest
This topic is now closed to further replies.
 Share

1 1