UniformBuffer Posted August 25, 2020 Posted August 25, 2020 Armbianmonitor: http://ix.io/2vcA Hi, i use a AML-S905X-CC board and since the last kernel update, something start moving on the hardware acceleration side, i got mpv hwdec working. I would like to make some tests and start using kodi for media reproduction, but i'm scared that 256M of default cma memory is not enough. I have tried to set the cma memory size from kernel parameters using armbian-config. I have tried to add "cma=350M" and "extraargs=cma=350M" to boot environment. Both commands does not work. The strange thing is that i'm sure "extraargs=cma=350M" pass the command to the kernel parameters because i have already used it to disable a faulty composition output. I have checked the syntax of the cma kernel parameter at the linux kernel documentation and SHOULD be ok, but i'm not sure 100%: Spoiler cma=nn[MG]@[start[MG][-end[MG]]] [ARM,X86,KNL] Sets the size of kernel global memory area for contiguous memory allocations and optionally the placement constraint by the physical address range of memory allocations. A value of 0 disables CMA altogether. For more information, see include/linux/dma-contiguous.h I verify the cma allocation from dmesg, that report: [ 0.000000] Reserved memory: created CMA memory pool at 0x000000006b000000, size 256 MiB I also verify it by using "cat /proc/meminfo" and on the end there is: CmaTotal: 262144 kB CmaFree: 12772 kB So the question is: how can i change the cma allocation size, without recompile the kernel if possible? Is it possible that the cma size has been hardwritten at compile time and runtime changes are ignored? Thanks for the attention, have a good day
dante6913 Posted August 27, 2020 Posted August 27, 2020 (edited) UniformBuffer On 8/25/2020 at 4:30 PM, UniformBuffer said: Armbianmonitor: http://ix.io/2vcA Hi, i use a AML-S905X-CC board and since the last kernel update, something start moving on the hardware acceleration side, i got mpv hwdec working. I would like to make some tests and start using kodi for media reproduction, but i'm scared that 256M of default cma memory is not enough. I have tried to set the cma memory size from kernel parameters using armbian-config. I have tried to add "cma=350M" and "extraargs=cma=350M" to boot environment. Both commands does not work. The strange thing is that i'm sure "extraargs=cma=350M" pass the command to the kernel parameters because i have already used it to disable a faulty composition output. I have checked the syntax of the cma kernel parameter at the linux kernel documentation and SHOULD be ok, but i'm not sure 100%: Hide contents cma=nn[MG]@[start[MG][-end[MG]]] [ARM,X86,KNL] Sets the size of kernel global memory area for contiguous memory allocations and optionally the placement constraint by the physical address range of memory allocations. A value of 0 disables CMA altogether. For more information, see include/linux/dma-contiguous.h I verify the cma allocation from dmesg, that report: [ 0.000000] Reserved memory: created CMA memory pool at 0x000000006b000000, size 256 MiB I also verify it by using "cat /proc/meminfo" and on the end there is: CmaTotal: 262144 kB CmaFree: 12772 kB So the question is: how can i change the cma allocation size, without recompile the kernel if possible? Is it possible that the cma size has been hardwritten at compile time and runtime changes are ignored? Thanks for the attention, have a good day Hi UniformBuffer, what mpv.conf arguments that are you using? hwdec =auto? Thanks Edited August 27, 2020 by dante6913
UniformBuffer Posted August 27, 2020 Author Posted August 27, 2020 3 hours ago, dante6913 said: UniformBuffer Hi UniformBuffer, what mpv.conf arguments that are you using? hwdec =auto? Thanks Hi, i don't got any particular mpv conf, i have only added "ytdl-format=bestvideo[height<=?720][fps<=?60][vcodec!=?vp9]+bestaudio/best" to get video from youtube directly with the resolution of my monitor, nothing more. Your question make me some tests: unfortunatly i don't get hardware acceleration from youtube videos (maybe i have to change something), but i'm sure to got it from local video. Running a vp9 video of my self recorded desktop i got from "mpv --hwdec=yes ./myvid.mkv": ``` (+) Video --vid=1 (*) (vp9 1280x800) [vaapi] libva: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null) Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory Using hardware decoding (v4l2m2m-copy). VO: [gpu] 1280x800 nv12 V: 00:00:01 / 00:00:18 (10%) .... ``` Youtube videos do not use nv12, so maybe this make mpv do not enable hardware acceleration (i do not got "Using hardware decoding (v4l2m2m-copy)."). "VO: [gpu]" should means that the "video output is gpu accelerated", but i don't think means also hardware acceleration, i have to check this. If i make them accelerated i will send you a message PS: don't get my words for the truth, i'm a beginner on this topic and i'm experimenting to get some knowledge EDIT: got it. Using "mpv --hwdec=yes --hwdec-codecs=vp9 *youtube video address*" gives hardware decoding, but unfortunatly the performance are wrost than software decoding. Anyway i'm glad that something start working, good job Armbian team, i know this is not easy!
dante6913 Posted August 28, 2020 Posted August 28, 2020 23 hours ago, UniformBuffer said: Hi, i don't got any particular mpv conf, i have only added "ytdl-format=bestvideo[height<=?720][fps<=?60][vcodec!=?vp9]+bestaudio/best" to get video from youtube directly with the resolution of my monitor, nothing more. Your question make me some tests: unfortunatly i don't get hardware acceleration from youtube videos (maybe i have to change something), but i'm sure to got it from local video. Running a vp9 video of my self recorded desktop i got from "mpv --hwdec=yes ./myvid.mkv": ``` (+) Video --vid=1 (*) (vp9 1280x800) [vaapi] libva: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null) Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory Using hardware decoding (v4l2m2m-copy). VO: [gpu] 1280x800 nv12 V: 00:00:01 / 00:00:18 (10%) .... ``` Youtube videos do not use nv12, so maybe this make mpv do not enable hardware acceleration (i do not got "Using hardware decoding (v4l2m2m-copy)."). I have to check, if i make them accelerated i will send you a message I tried with balbes latest image Ver 20200828 kernel 5.7.16 but I only got output POLLERR so hardware decoding failed with all videos even mp4
UniformBuffer Posted August 29, 2020 Author Posted August 29, 2020 16 hours ago, dante6913 said: I tried with balbes latest image Ver 20200828 kernel 5.7.16 but I only got output POLLERR so hardware decoding failed with all videos even mp4 That's strange, what is your mesa version? Mine is 20.1.5 currently. Also my kernel is the 5.7.15 that currently use Armbian by default.
dante6913 Posted August 29, 2020 Posted August 29, 2020 3 hours ago, UniformBuffer said: That's strange, what is your mesa version? Mine is 20.1.5 currently. Also my kernel is the 5.7.15 that currently use Armbian by default. Its from ppa:oibaf/graphics-drivers
UniformBuffer Posted August 29, 2020 Author Posted August 29, 2020 2 hours ago, dante6913 said: Its from ppa:oibaf/graphics-drivers I personally don't use ppa repositories, so i don't know what version they are providing. You can check what is your by simply running glxinfo | grep "OpenGL version" Mine gives "OpenGL version string: 2.1 Mesa 20.1.5" My drivers are a bit updated than standard because i set Debian Sid as additional repository, so i get very cutting edge updates. Also, just to be sure, do you have "meson_vdec" module enabled? It is the kernel module that enable the hardware decoding. By default meson_vdec should be enabled, but who knows? To check you can run "lsmod" and search among them for meson_vdec. If you don't have it enabled, you can enable with sudo modprobe meson-vdec (notice that when enabling, the module is called "meson-vdec", not "meson_vdec") This will enable it for current session, so do not survive rebooting. To make it load on every boot, add "meson-vdec" to /etc/modules
UniformBuffer Posted October 18, 2020 Author Posted October 18, 2020 Hi, i'm making some test to make meson_vdec working. Currently i got v4l2m2m hardware decoding on both xorg and wayland, but the video stutter a lot. While playing i have noticed that on dmesg i got some errors like: Spoiler [ 75.219223] alloc_contig_range: [70c00, 70cf0) PFNs busy [ 75.219421] alloc_contig_range: [70d00, 70df0) PFNs busy [ 75.221725] alloc_contig_range: [70c00, 70cf0) PFNs busy [ 75.221952] alloc_contig_range: [70d00, 70df0) PFNs busy [ 75.223993] alloc_contig_range: [70c00, 70c78) PFNs busy [ 75.224134] alloc_contig_range: [70c80, 70cf8) PFNs busy [ 75.224270] alloc_contig_range: [70d00, 70d78) PFNs busy [ 75.224490] alloc_contig_range: [70d80, 70df8) PFNs busy [ 75.230638] alloc_contig_range: [70c00, 70cf0) PFNs busy [ 75.230801] alloc_contig_range: [70d00, 70df0) PFNs busy [ 168.034741] alloc_contig_range: 82 callbacks suppressed [ 168.034748] alloc_contig_range: [6e900, 705a0) PFNs busy [ 168.038374] alloc_contig_range: [6ea00, 706a0) PFNs busy [ 168.041621] alloc_contig_range: [6ea00, 707a0) PFNs busy [ 168.044211] alloc_contig_range: [6ec00, 708a0) PFNs busy [ 168.046937] alloc_contig_range: [6ed00, 709a0) PFNs busy [ 168.050334] alloc_contig_range: [6ee00, 70aa0) PFNs busy [ 168.054546] alloc_contig_range: [6ee00, 70ba0) PFNs busy [ 168.056783] alloc_contig_range: [6f000, 70ca0) PFNs busy [ 168.057812] alloc_contig_range: [6f100, 70da0) PFNs busy [ 168.059887] alloc_contig_range: [6f200, 70ea0) PFNs busy From what i have read, it has something to do with cma and looking at "cat /proc/meminfo" i got: Spoiler cat /proc/meminfo MemTotal: 1939744 kB MemFree: 467424 kB MemAvailable: 1017360 kB Buffers: 103756 kB Cached: 640368 kB SwapCached: 0 kB Active: 949408 kB Inactive: 287908 kB Active(anon): 528780 kB Inactive(anon): 61796 kB Active(file): 420628 kB Inactive(file): 226112 kB Unevictable: 35076 kB Mlocked: 1280 kB SwapTotal: 969868 kB SwapFree: 969868 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 523500 kB Mapped: 318244 kB Shmem: 96236 kB KReclaimable: 61964 kB Slab: 122056 kB SReclaimable: 61964 kB SUnreclaim: 60092 kB KernelStack: 6076 kB PageTables: 7564 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1939740 kB Committed_AS: 1838164 kB VmallocTotal: 135290159040 kB VmallocUsed: 22676 kB VmallocChunk: 0 kB Percpu: 1616 kB HardwareCorrupted: 0 kB AnonHugePages: 67584 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 262144 kB CmaFree: 2712 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB If you notice, i got only 2 mb of free cma memory, so i can suppose that the reason of bad playback performance COULD be not having enough memory, so i would like to ask, again, if someone know a way to change the cma size. Like i said in the previous posts, setting it from kernel parameters using ArmbianEnv.txt do not work. Maybe the cma size has been hardcoded on kernel config?
TonyMac32 Posted October 21, 2020 Posted October 21, 2020 Hello uniformBuffer, The CMA memory is set by a device tree entry, so that would be the only way to change it to my knowledge. The drivers are extremely unoptimized presently for the vdec/venc, and the recommended setting is over 800 MB. This is an obvious problem for a general-purpose distribution, it renders the La Frite 512 MB boards unbootable, and leaves no real room for any working tasks on the 1 GB ones. It may be possible to implement this as an overlay, but in the meantime modifying the device tree is the only route: https://github.com/armbian/build/blob/cc7ab6a6b1d91977bd9e154245307e85f7f76519/patch/kernel/meson64-current/0302-arm64-dts-meson-set-dma-pool-to-896MB.patch This patch gives you the handle and value to get the required 896 MB CMA pool. 1
UniformBuffer Posted October 22, 2020 Author Posted October 22, 2020 13 hours ago, TonyMac32 said: Hello uniformBuffer, The CMA memory is set by a device tree entry, so that would be the only way to change it to my knowledge. The drivers are extremely unoptimized presently for the vdec/venc, and the recommended setting is over 800 MB. This is an obvious problem for a general-purpose distribution, it renders the La Frite 512 MB boards unbootable, and leaves no real room for any working tasks on the 1 GB ones. It may be possible to implement this as an overlay, but in the meantime modifying the device tree is the only route: https://github.com/armbian/build/blob/cc7ab6a6b1d91977bd9e154245307e85f7f76519/patch/kernel/meson64-current/0302-arm64-dts-meson-set-dma-pool-to-896MB.patch This patch gives you the handle and value to get the required 896 MB CMA pool. Thanks for the answer, this explain why the using hardware decoding stutter so much. Unfortunately i'm not so skilled to recompile the kernel, i'm scared to make device unbootable (that happened a lot of time). Also i don't have a an x86 machine to use Armbian tools. Since, like you said, increasing so much the cma make the board useless for any other purpose, i will wait for some improvements. Thanks anyway for the answer! PS: I would like to ask a question for my personal curiosity: do you know if exist a board that have hardware decoding enabled on mainline kernel? As far as i know, even allwinner devices, that have been completely reverse engineered, have trouble to enable it because applications need to support v4l2_request api. Seems like find a unicorn is much more easy. This is absolutely not a critique, i know that is very difficult to get it working, i would simply like to know if exist some other board supported by Armbian that have it enabled because i would like to use it as general purpose/media device, and.... i would like to avoid Android as much as possible. Thanks for the attention, Have a good day
TonyMac32 Posted October 22, 2020 Posted October 22, 2020 5 hours ago, UniformBuffer said: Thanks for the answer, this explain why the using hardware decoding stutter so much. Unfortunately i'm not so skilled to recompile the kernel, i'm scared to make device unbootable (that happened a lot of time). Also i don't have a an x86 machine to use Armbian tools. Since, like you said, increasing so much the cma make the board useless for any other purpose, i will wait for some improvements. Thankfully the kernel does not need recompiled for this, only the device tree. See the post below, I b elieve this is still accurate as far as device tree decompile/recompile method, can be done on the device (use the correct device tree for your board ) 1
Solution TonyMac32 Posted October 22, 2020 Solution Posted October 22, 2020 I just did a quick test on Le Potato, increasing CMA to 512 MB was very easy following the above instructions with the /boot/dtb/amlogic/meson-gxl-s905x-libretech-cc.dtb device tree. The property yo uwant to change is right at the top of the file as a handle under "reserved-memory" Any value that is 0x400000 aligned should be valid, so you can experiment. 1
UniformBuffer Posted October 23, 2020 Author Posted October 23, 2020 19 hours ago, TonyMac32 said: I just did a quick test on Le Potato, increasing CMA to 512 MB was very easy following the above instructions with the /boot/dtb/amlogic/meson-gxl-s905x-libretech-cc.dtb device tree. The property yo uwant to change is right at the top of the file as a handle under "reserved-memory" Any value that is 0x400000 aligned should be valid, so you can experiment. Thanks for the info, like i said i'm not very skilled with kernel things, but thanks to your guide i was able to change cma allocation from 256M to 512M. I have also tried to set 1GB with 0x40000000, that should be aligned with 0x400000, but after setting it i got no monitor output (from leds pattern i can say it was working, sysreq magic keys also worked, so the kernel was on). Anyway, that's not foundamental, i have also tried to set the cma with the same value of the patch you linked (0x38000000) and it worked, making the cma 896MB. I have tested some h264 and vp9 videos and the performance it's still choppy (after increasing to 512MB it become a little better). I have tried mpv with `mpv --hwdec=yes video.mkv` and ffplay with `ffplay -vcodec h264_v4l2m2m video_h264.mkv` and `ffplay -vcodec vp9_v4l2m2m video_vp9.webm` . For now i got the best performance with vp9 format using ffplay. A strange thing is that ffplay perform better than mpv, but mpv use ffmpeg like ffmplay, so they should have more or less the same performance. Anyway the low performance seems to be related to a VERY single threaded behavior. After increasing cma to 512MB i got some free cma left (40-80MB) and increasing again to 896MB increase the free cma proportionally, so it seems that meson_vdec does not get advantage from memory after ~512MB. Anyway, even if there are some problems, i'm happy to be able to see progresses with my eyes! Thanks again for the help and for your hard work 1
Recommended Posts