Tutorial Self-contained Tensorflow object detector on Orange pi lite + GC2035

atomic77 · November 21, 2020

I got my hands on a "Set 9" Orange Pi Lite + GC2035 camera a while back and I've finally been able to put together a self-contained object detection device using Tensorflow, without sending any image data outside for processing.

Basically, its a python Flask application that captures frames from the camera using a GStreamer pipeline. It runs them through a Tensorflow object detection model and spits out the same frame with extra metadata about objects it found, and renders a box around them. Using all four cores of the H2 it can do about 2-3 fps. The app keeps track of the count of all object types it has seen and exposes the metrics in prometheus format, for easy creation of graphs of what it sees over time with Grafana

I'll explain some of the more interesting aspects of how I got this to work here in case anyone else wants to try to get some use out of this very inexpensive hardware, and I am grateful to the many posts on this forum that helped me along the way!

Use a 3.4 kernel with custom GC2035 driver

Don't bother with anything new - the GC2035 was hopeless on any newer builds of Armbian I tried. The driver available at https://github.com/avafinger/gc2035.git provided far better image quality. After installing the updated GC2035, I run the following to get the camera up and running:

sudo sunxi-pio -m "PG11<1><0><1><1>"
sudo modprobe gc2035 hres=1
sudo modprobe vfe_v4l2

Install Tensorflow lite runtime

Google provides a tensorflow runtime as a binary wheel built for python 3.5 armv7. When pip installing, expect it to take 20 minutes or so as it will need to compile numpy (the apt repo version isn't recent enough)

wget https://github.com/google-coral/pycoral/releases/download/release-frogfish/tflite_runtime-2.5.0-cp35-cp35m-linux_armv7l.whl
sudo -H pip3 install tflite_runtime-2.5.0-cp35-cp35m-linux_armv7l.whl

Build opencv for python 3.5 bindings

This was something I tried everything I could to avoid, but I just could not get the colour conversion from the YUV format of the GC2035 to an RGB image using anything else I found online, so I was dependent on a single color-conversion utility function.

To build the 3.4.12 version for use with python (grab lunch - takes about 1.5 hours :-O )

cmake -DCMAKE_INSTALL_PREFIX=/home/atomic/local -DSOFTFP=ON \
    -DBUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_opencv_python2=0 \
    -D BUILD_opencv_python3=1 -D WITH_GSTREAMER=ON \
    -D PYTHON3_INCLUDE_PATH=/usr/include/python3.5  ..
make -j 4
make install

# Check that ~/local/lib/python3.5/dist-packages should now have the cv2 shlib
export PYTHONPATH=/home/atomic/local/lib/python3.5/dist-packages

Build gstreamer plugin for Cedar H264 encoder

This is required to get a working gstreamer pipeline for the video feed:

git clone https://github.com/gtalusan/gst-plugin-cedar
./autogen.sh
sudo make install
# When trying against a pipc I had to copy into .local to get gstreamer to recognise it
cp /usr/local/lib/gstreamer-1.0/libgst* ~/.local/share/gstreamer-1.0/plugins/
# Confirm that plugin is installed:
gst-inspect-1.0 cedar_h264enc

Processing images

The full app source is on github, but the more interesting parts that took me some time to figure out were about getting python to cooperate with gstreamer:

Frames from the camera arrive to python at the end of the pipeline as an appsink. The Gstreamer pipeline I configured via python was:

    src =  Gst.ElementFactory.make("v4l2src")
    src.set_property("device", "/dev/video0")
    src.set_property("do-timestamp", 1)
    filt = Gst.ElementFactory.make("capsfilter")
    filt.set_property("caps", Gst.caps_from_string("video/x-raw,format=NV12,width=800,height=600,framerate=12/1"))
    p1 = Gst.ElementFactory.make("cedar_h264enc")
    p2 = Gst.ElementFactory.make("h264parse")
    p3 = Gst.ElementFactory.make("rtph264pay")
    p3.set_property("config-interval", 1)
    p3.set_property("pt", 96)
    p4 = Gst.ElementFactory.make("rtph264depay")
    p5 = Gst.ElementFactory.make("avdec_h264")
    sink = Gst.ElementFactory.make("appsink", "sink")
    pipeline_elements = [src, filt, p1, p2, p3, p4, p5, sink]

    sink.set_property("max-buffers", 10)
    sink.set_property('emit-signals', True)
    sink.set_property('sync', False)
    sink.connect("new-sample", on_buffer, sink)

This pipeline definition causes a callback on_buffer to be called every time a frame is emitted from the camera:

def on_buffer(sink: GstApp.AppSink, data: typing.Any) -> Gst.FlowReturn:
    # Sample will be a 800x900 byte array in a very frustrating YUV420 format
    sample = sink.emit("pull-sample")  # Gst.Sample
    ... conversion to numpy array
    # rgb is now in a format that Pillow can easily work with
    # These two calls are what you compiled opencv for 1.5 hours for :-D
    rgb = cv2.cvtColor(img_arr, cv2.COLOR_YUV2BGR_I420)
    rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)

Once you have a nice pillow RGB image, it's easy to pass this into a Tensorflow model, and there is tons of material on the web for how you can do things like that. For fast but not so accurate detection, I used the ssdlite_mobilenet_v2_coco pretrained model, which can handle about 0.5 frames per second per core of the H2 Allwinner CPU.

There are some problems I still have to work out. Occasionally the video stream stalls and I haven't figured out how to recover from this without restarting the app completely. The way frame data is passed around tensorflow worker processes is probably not ideal and needs to be cleaned up, but it does allow me to get much better throughput using all four cores.

For more details, including a detailed build script, the full source is here:

https://github.com/atomic77/opilite-object-detect