A Robot for real-time object detection with accelerated Tensor Flow models, tracking and WebRTC streaming

Overview

These are step by step instructions which show how to quickly configure UV4L to realize a robot doing the following “out-of-the-box” at the same time:

  • live, real-time (>= 30fps) detection of specified objects with the preferred Tensor Flow SSD model
  • optional: real-time tracking of the detected objects (one or more) with pan/tilt servo motors
  • optional: real-time streaming over the web  with WebRTC (e.g. to a Janus WebRTC Server)

How does this work? A new driver called uv4l-raspicam-ai has been recently introduced into the UV4L suite, where “ai” stands for Artificial Intelligence: in facts, this driver is totally equivalent to the traditional and alternative uv4l-raspicam driver, except it adds support for a few new options which allow to run Tensor Flow Lite models through the Google Edge TPU USB accelerator. Furthermore, this driver has native support for the Pimoroni Pan-Tilt HAT which can be controlled with an industrial-strenght, built-in and highly-tunable PID controller. It is safe to install this driver even if you do not have the Edge TPU and the Pi Hat or do not plan to use them, as the corresponding options are disabled by default. At any time, you can switch back and forth to/from the classic uv4l-raspicam driver (by preserving the current configuration).

Hardware requirements
  1. Raspberry Pi (any model, except the Pi Zero) to run the required software
  2. Official Raspberry Pi Camera Board (any model) for video capturing at any resolution up to Full HD
  3. Google Edge TPU USB Accelerator to accelerate the neural network inference of the SSD Tensor Flow Lite models
  4. optional: if you want to track the detected objects, get a Pimoroni Pan-Tilt Hat (PIM183) on which the camera has to be mounted
  5. optional: you can even attach a microphone, speakers and a monitor to the Raspberry Pi if you are interested in two-way audio/video streaming over the web (this allows cool things like remote assistance or videoconference with other participants)
Software requirements

In order:

  1. install the libedgetpu debian package for Raspberry Pi from the Google Coral website (it comes in two flavors, follow the official installation instructions)
  2. get the preferred SSD Tensor Flow Lite model compiled for the Edge TPU for the object detection: make sure the accepted input image width and height of the model are multiple of 32 and 16, respectively. Some models can be downloaded from here. For the purpose of this example, we will be detecting and tracking faces with this model: MobileNet SSD v2 (320×320 Faces) (which should also already implement non-maximum suppression).
  3. install the UV4L software modules: uv4l, the new uv4l-raspicam-ai (not uv4l-raspicam!), uv4l-raspicam-extras, and, if you want to stream with WebRTC (as in this example), also uv4l-server, uv4l-webrtc (read the installation instructions for more details)
  4. make sure both the camera board and, if you want to do object tracking, the I2C interfaces are enabled on the Raspberry Pi. Make sure to also find a good compromise for the GPU Mem vs RAM split (especially on the Raspberry Pi 2 or below) depending on the resolution you want to capture the video at. Use the raspi-config system command to enable, set or check all this stuff.
A first example: UV4L configuration for face detection and tracking

Once you have installed the software as described above, it’s time to enable (edit and uncomment by removing the #) the relevant options in the UV4L configuration file in /etc/uv4l/uv4l-raspicam.conf. They are listed below and the default values of the already uncommented options have been tested for face detection and tracking with the model mentioned in the previous paragraph. The options should be self-explanatory, but if you need more detailed informations, please read the uv4l-raspicam system manual or refer to the online documentation (in general, the latter might be slightly out of date):

### TensorFlow model options
tflite-model-file = /path/to/ssd_mobilenet_v2_face_quant_postprocess_edgetpu.tflite
tflite-model-input-width = 320
tflite-model-input-height = 320
tflite-model-output-topk = 3
tflite-model-output-threshold = 0.25
### Draw boundary boxes, id or labels, scores, etc... on the image,
### if enabled, this option might slow down the framerate at high resolutions:
tflite-overlay-model-output = no 
### optionally, only consider the following object class ids among the top-k predictions:
### zero or more object class ids (one per line) can be specified, e.g.:
# tflite-detection-classids = 0
# tflite-detection-classids = 43
# tflite-detection-classids = 45
### optional path to a file containing classId Label pairs (one per line):
# tflite-labels-file = #path

### Options for pan/tilt object tracking with PID controller
tracking-pan-tilt = yes
### Tracking strategies:
### "maxarea" (object with largest size in pic.)
### "all" (every object - centroid)
tracking-strategy = maxarea
tracking-pan-pid-kp = 0.0055
tracking-pan-pid-ki = 0.00045
tracking-pan-pid-kd = 0.0
tracking-tilt-pid-kp = 0.003
tracking-tilt-pid-ki = 0.0005
tracking-tilt-pid-kd = 0.0
tracking-pid-p-on-m = yes
tracking-pan-home-position = 0 # from -90 to 90
tracking-tilt-home-position = 0 # from -90 to 90
tracking-home-init = yes
tracking-home-timeout = 15000 # in ms, 0 for no timeout
### if pan is servo channel 1 & tilt is servo channel 2, then specify "yes" below,
### otherwise "no" if channels are swapped
tracking-pan-servo-channel1 = yes
# tracking-hat-i2c-dev = "/dev/i2c-1"

Now reboot and plug the Edge TPU USB Accelerator into an USB port (USB3 suggested). You are now ready to capture video while tracking the detected object (face in this example). If you installed the Server and WebRTC UV4L modules, you can quickly try to run and/or record a video session in the browser: by default just open the page at http://<your_rpi_address>:8080/stream/webrtc .

Below is the first DEMO of the Robot in action.

Content not available.
Please allow cookies by clicking Accept on the banner

Another example: object detection only

If you are only interested in object detection with no tracking, for example, to just test the accuracy of a particular model, you can disable the tracking option at all and enable the option to draw the boundary boxes around the detected objects and to overlay the class id or the label text on the top of the boxes together with the confidence level of the predictions.

Below is an example of a live video streaming session over WebRTC directly recorded from within a PC browser in the same LAN as the Raspberry Pi 4. That is, the H264, high resolution (1536×768) video is streamed from a Raspberry Pi where both the new UV4L raspicam-ai driver (which runs the model) and streaming server are running.

Content not available.
Please allow cookies by clicking Accept on the banner

With regard to the Tensor Flow Lite section in the UV4L configuration file, in this case the following options have been  enabled in /etc/uv4l/uv4l-raspicam.conf:

### TensorFlow model options
tflite-model-file = /path/to/ssdlite_mobiledet_coco_qat_postprocess_edgetpu.tflite
tflite-labels-file = /path/to/coco_labels.txt
tflite-model-input-width = 320
tflite-model-input-height = 320
tflite-model-output-topk = 6
tflite-model-output-threshold = 0.5
tflite-overlay-model-output = yes

### Options for pan/tilt object tracking with PID controller
tracking-pan-tilt = no

Note that overlaying boundary boxes, labels and other info onto each video frame might slow down the maximum achievable framerate at the highest resolutions due to some internal frame synchronization.

The MobileDet SSD (320×320) model and the label files specified in this case are based on a COCO dataset and can be found here.

Do you have an idea or cool project to propose making use of AI? Contact us.

Please keep an eye on this page as more important details might be added in the future to eventually make things clearer.