I've been doing a bit of work to get gstreamer-1.0 working with my Nano Pi NEO Air (AW H3). I can pull in about 11fps at 1920x1080 and 30fps at 1280x720 with CAM500B (OV5640) using the following command:
GST_PLUGIN_PATH=/usr/local/lib/gstreamer-1.0 gst-launch-1.0 -vem v4l2src ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! cedar_h264enc ! h264parse ! fpsdisplaysink text-overlay=false video-sink=testsink
gst-plugin-cedar modifications are here: https://github.com/gtalusan/gst-plugin-cedar
sun8i kernel modification here: https://github.com/gtalusan/sun8i-linux-kernel
I've submitted a pull request to Armbian too here: https://github.com/igorpecovnik/lib/pull/655
Right now the V4L2 buffers are backed by a DMA buffer as far as I can tell. This is mmap'd into userspace memory and then memcpy'd into another DMA buffer for VPU/VFE/ION H264 encoding. The memcpy could maybe disappear if the physical address of the DMA buffer was passed along via V4L2 API, but I haven't found a clean way to do that. FFMPEG-Cedrus also has the same problem. Commenting out the memcpy in gst-plugin-cedar (and hence H264 encoding garbage) drops my CPU load down to about 5-10% so there's definitely room for improvement if the DMA buffers can be shared.