Object Detection with OpenCV: A Step-by-Step Tutorial

Computer vision tasks are responsible for making computers see the world as we do, through our eyes, and perceive the information similarly. There are many computer-vision tasks such as object detection, object/image recognition, object segmentation, optical character recognition(OCR), pose estimation, object tracking, facial recognition, and so on, each having its specific use case.

For example, we often use object detection and object tracking to control road traffic and detect suspicious behaviors. Facial recognition can be used to recognize the faces of persons in an image.

Object detection is a computer vision task that involves identifying and localizing objects in an image or video frame. It uses bounding boxes to differentiate instances and is widely used in applications like self-driving cars, medical imaging, and traffic surveillance. OpenCV, a popular open-source computer vision library, can be used with pre-trained models like TensorFlow’s SSD to perform object detection by setting confidence thresholds and drawing bounding boxes around detected objects.

For context, refer to this article on image recognition with AI

This tutorial will teach us how to detect objects using the OpenCV library.

Introduction to Object Detection with OpenCV

OpenCV or open-source Computer Vision Library is a Python library designed to help developers seamlessly integrate computer vision applications with machine learning. Initially written in C++, it can also be used with languages like Python, C, and Java. Furthermore, OpenCV is also compatible with various operating systems like Windows, Linux, and Mac.

We can integrate the computer vision library with famous Python libraries like Numpy, we can develop a computer vision model for image processing tasks.

What Is Object Detection?

Object detection is a computer vision task that involves identifying and localizing an object in an image or a video frame. It specifically uses the concept of drawing bounding boxes on different instances in an image to differentiate them from other instances, therefore localizing the instances.

Object detection is especially useful when multiple objects are in the same image or video frame.

Object detection is widely used in medical imaging, and in traffic surveillance cameras to monitor the traffic, count the number of vehicles in each frame of the live feed, and so on.

Its main application is in self-driving cars. The object detection model helps the self-driving car locate obstacles and detect persons so it doesn’t hit someone.

Object Detection vs. Object Recognition

Object Recognition and Detection are two computer vision tasks often confused with each other and used interchangeably. While both might sound similar, they have different end goals and pipelines.

Detection focuses on feature extraction, localization and classification of the object in the image or a video frame. Object recognition may or may not localize the instance of the object after feature extraction and goes directly for classification of the object.

Here is the basic difference between object recogntion and detection.

The Role of Neural Networks in Object Detection

OpenCV detection is the traditional method used before introducing neural networks for computer vision tasks. In recent years, we have seen a spike in the usage of convolutional neural networks in object detection, especially the YOLO(You Only Look Once) family of object detection models, and the region-based CNNs like RCNN are being used for tasks such as object recognition and detection.

While focusing on the traditional method in this tutorial, we will understand object detection using neural networks in the next one.

Step-by-Step Object Detection Using OpenCV

Let us see an example of object detection using OpenCV. We are going to use a pre-trained model of the tensorflow library. The pre-trained, frozen model can be downloaded from the official GitHub page linked in the references.

import cv2
import matplotlib.pyplot as plt

We are importing the OpenCV and matplotlib libraries in the first two lines. Next, we load the necessary files and download them.

config_file = '/content/drive/MyDrive/objdetec files/ssd_mobilenet_v3_large_coco_2020_01_14 (2).pbtxt'
frozen_model = '/content/drive/MyDrive/objdetec files/frozen_inference_graph.pb'

The first line determines the configuration file for the trained model, which is in the form of an inference graph containing the model’s weights.

Next, we are creating a model using the frozen model and the configuration model with the help of the DetectionModel class.

model = cv2.dnn_DetectionModel(frozen_model,config_file)

Since the model is already pre-trained, we can see the objects or classes the model successfully detected. We will use this list of labels to detect and classify the objects in our image.

classlabels=[]
filename = '/content/drive/MyDrive/objdetec files/labels.txt'
with open(filename,'rt') as fpt:
  classlabels = fpt.read().rstrip('\n').split('\n')

First, we define a list called classlabels to store the list of the objects’ labels in the labels file. The labels are appended to the class labels list.

len(classlabels)

This code snippet gives us the number of classes present in the list.

There are 80 classes available. We can print the list of the classes using the following line.

classlabels

Next, we set up the input size, scale, and mean and swap the colors of the image as part of processing. Here is the image we are going to use:

This image is resized to the required model input size, the dimensions are scaled and the colors in the images are swapped using the below snippet.

model.setInputSize(120,120)
model.setInputScale(1.0/127.5)
model.setInputMean((127.5,127.5,127.5))
model.setInputSwapRB(True)

img = cv2.imread('testimg.jpg')
plt.imshow(img)

In the next few lines, we are setting up a confidence threshold to detect objects in the image, draw bounding boxes around different objects.

classindex,confidence,bbox = model.detect(img,confThreshold = 0.5)
print(classindex)

Only two classes are detected in the image – person(1), and bicycle(2). The dog is not detected even though there is a dog class in the labels list. This is because the confThreshold is set to 0.5, which might be low to detect the dog. If we reduce the confidence threshold, we can get the model to detect other objects too. We must try different thresholds like(0.4,0.3,0.3, etc) to get the expected results.

Now, we draw the bounding boxes around the objects detected.

font_scale = 3
font = cv2.FONT_HERSHEY_PLAIN
for classind,conf,boxes in zip(classindex.flatten(),confidence.flatten(),bbox):
  cv2.rectangle(img,boxes,(255,0,255),2)
  cv2.putText(img,classlabels[classind-1],(boxes[0]+10,boxes[1]+10),font,fontScale = font_scale, color = (255,255,0),thickness = 3)

We are drawing the bounding boxes around the objects with the help of thr cv2.rectangle method and displaying the label of the corresponding object using the cv2.putText method.

We can use the following code to display the image with bounding boxes.

plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))

Summary

In this introduction to object detection tutorial, we have gone through the basics of OpenCV, the definition of object detection, and addressed the difference between object recognition and detection.

Next, we have seen an example of object detection using the OpenCV library and TensorFlow’s pre-trained single-shot detector(SSD) model. We can achieve better results using this model by tweaking the confidence threshold and choosing the best image.

Useful Links

Do check out a few other useful computer vision-related blogs on our site!

References

OpenCV Documentation

TensorFlow Github