A Robot for real-time object detection with accelerated TensorFlow models, tracking and WebRTC streaming

Overview

These are step by step instructions which show you how to quickly configure UV4L to realize an extremely responsive robot doing the following things at the same time “out-of-the-box” (no programming skills are required):

live, real-time (>= 30fps) detection of specified objects in each frame with the preferred TensorFlow Lite Object Detection model
optional: real-time tracking of the detected objects (one or more) with pan/tilt servo motors
optional: real-time audio-video streaming over the web with WebRTC (e.g. to a Janus WebRTC Server “on the cloud”)
optional (Work-In-Progress, stay tuned!): MQTT protocol so that the results of the detection can be published to and consumed by any third-party MQTT clients

How does this work? A new driver called uv4l-raspicam-ai has been recently introduced into the UV4L suite, where “ai” stands for Artificial Intelligence obviously: in facts, this driver is totally equivalent to the traditional and alternative uv4l-raspicam driver, except it adds support for a few new options which allow to run the preferred TensorFlow Lite models, optionally with the help of the Google Edge TPU USB accelerator.

Furthermore, this driver provides native support for the Pimoroni Pan-Tilt HAT which can be used to actuate the tracking of the detected objects and is automatically controlled by an industrial-strength and highly-tunable PID controller built in the driver itself.

It is safe to install this driver even if you do not already have the Edge TPU and the Pi Hat or do not plan to use them, as the corresponding options are disabled by default. In any case, at any time, you can switch back and forth to/from the classic uv4l-raspicam driver (by preserving the current configuration).

Hardware requirements

Raspberry Pi (any model, except the Pi Zero) to run the required software
official Raspberry Pi Camera Board (any model) for video capturing at any resolution up to Full HD
optional but highly recommended: Google Edge TPU USB Accelerator to accelerate the neural network inference of the SSD TensorFlow Lite models which would otherwise be run by the CPU (slow)
optional: if you want to track the detected objects, get a Pimoroni Pan-Tilt Hat (PIM183) on which the camera has to be mounted
optional: you can even attach a microphone, speakers and a monitor to the Raspberry Pi if you are interested in two-way audio/video streaming over the web (this allows cool things like remote assistance or videoconference with other participants)

Software requirements

In order:

get and install the libedgetpu debian package for Raspberry Pi from the Google Coral website (the package comes in two flavors). You need this library even if you do not have an Edge TPU accelerator. Please follow these official installation instructions.
get the preferred SSD TensorFlow Lite model for the object detection. If you decide to accelerate the model with the Edge TPU, make sure that it is one compiled for this device (usually models compiled for the Edge TPU have an additional “_edgetpu” substring in the corresponding filenames).
Also make sure the accepted input image width and height of the model are multiple of 32 and 16, respectively. Some models can be downloaded from here. For the purpose of one of the examples below, we will be detecting and tracking faces with this model: MobileNet SSD v2 (320×320 Faces) (which should also already implement non-maximum suppression).
install the required UV4L software modules: uv4l, the new uv4l-raspicam-ai (not uv4l-raspicam!) and uv4l-raspicam-extras (read the installation instructions for more details)
optional: if you want to stream with WebRTC (as in the examples below), please install the uv4l-server and the uv4l-webrtc modules as well
make sure both the camera board and, if you want to do object tracking, the I2C interfaces are enabled on the Raspberry Pi. Make sure to also find a good compromise for the GPU Mem vs RAM split (especially on the Raspberry Pi 2 or below) depending on the resolution you want to capture the video at. Use the raspi-config system command to enable, set or check all this stuff.

A first example: UV4L configuration for face detection and tracking

Once you have installed the software as described above, it’s time to enable (edit and uncomment by removing the #) the relevant options in the UV4L configuration file in /etc/uv4l/uv4l-raspicam.conf. They are listed below and the default values of the already uncommented options have been tested for face detection and tracking with the model mentioned in the previous paragraph. The options should be self-explanatory, but if you need more detailed informations, please read the uv4l-raspicam system manual or refer to the online documentation:

### TensorFlow model options
tflite-model-file = /path/to/ssd_mobilenet_v2_face_quant_postprocess_edgetpu.tflite
tflite-model-output-topk = 3
tflite-model-output-threshold = 0.25
### Draw boundary boxes, id or labels, scores, etc... on the image,
### if enabled, this option might slow down the framerate at high resolutions:
tflite-overlay-model-output = no 
### optionally, only consider the following object class ids among the top-k predictions:
### zero or more object class ids (one per line) can be specified, e.g.:
# tflite-detection-classids = 0
# tflite-detection-classids = 43
# tflite-detection-classids = 45
### optional path to a file containing classId Label pairs (one per line):
# tflite-labels-file = #path

### Options for pan/tilt object tracking with PID controller
tracking-pan-tilt = yes
### Tracking strategies:
### "maxarea" (detected object having largest size in pic.)
### "all" (every detected object - centroid)
tracking-strategy = maxarea
tracking-pan-pid-kp = 0.0055
tracking-pan-pid-ki = 0.00045
tracking-pan-pid-kd = 0.0
tracking-tilt-pid-kp = 0.003
tracking-tilt-pid-ki = 0.0005
tracking-tilt-pid-kd = 0.0
tracking-pid-p-on-m = yes
tracking-pan-home-position = 0 # from -90 to 90
tracking-tilt-home-position = 0 # from -90 to 90
tracking-home-init = yes
tracking-home-timeout = 15000 # in ms, 0 for no timeout
### if pan is servo channel 1 & tilt is servo channel 2, then specify "yes" below,
### otherwise "no" if channels are swapped
tracking-pan-servo-channel1 = yes
# tracking-hat-i2c-dev = "/dev/i2c-1"

Now reboot and plug the Edge TPU USB Accelerator into an USB port (USB3 suggested). You are now ready to capture video while tracking the detected object (face in this example). If you installed the Server and WebRTC UV4L modules, you can quickly try to run and/or record a video session in the browser: by default just open the page at http://<your_rpi_address>:8080/stream/webrtc .

Below is the first DEMO of the Robot in action.

Content not available.
Please allow cookies by clicking Accept on the banner

Another example: object detection only

If you are only interested in object detection with no tracking, for example, to just test the accuracy of a particular model, you can disable the tracking option at all and enable the option to draw the boundary boxes around the detected objects and to overlay the class id or the label text on the top of the boxes together with the confidence level of the predictions.

Below is an example of a live video streaming session over WebRTC directly recorded from within a PC browser in the same LAN as the Raspberry Pi 4. That is, the H264, high resolution (1536×768) video is live-streamed from a Raspberry Pi where both the new UV4L raspicam-ai driver (which runs the model) and streaming server are running.

Content not available.
Please allow cookies by clicking Accept on the banner

With regard to the TensorFlow Lite section in the default UV4L configuration file /etc/uv4l/uv4l-raspicam.conf, in this case the following options have been enabled:

### TensorFlow model options
tflite-model-file = /path/to/ssdlite_mobiledet_coco_qat_postprocess_edgetpu.tflite
tflite-labels-file = /path/to/coco_labels.txt
tflite-model-output-topk = 6
tflite-model-output-threshold = 0.5
tflite-overlay-model-output = yes

### Options for pan/tilt object tracking with PID controller
tracking-pan-tilt = no

Note that overlaying boundary boxes, labels and other info onto each video frame might slow down the maximum achievable framerate at the highest resolutions due to some internal frame synchronization.

The MobileDet SSD (320×320) model and the label files specified in this case are based on a COCO dataset and can be found here.

Streaming to the cloud

Just follow this other tutorial.

Do you have an idea or cool project to propose making use of AI? Contact us.

Please keep an eye on this page as more important details might be added in the future to eventually make things clearer.