Day 12: Implementing YOLO for Object Detection

Day 12’s challenge is about implementing YOLO (You Only Look Once) for object detection. YOLO is one of the most well-known and widely used real-time object detection models due to its efficiency and speed.

Since this is your first time diving into YOLO, we’ll take a tutorial-based approach and focus on understanding the core concepts and getting a simplified implementation up and running.

Overview of YOLO (You Only Look Once)

  • YOLO is a popular object detection model that can identify and locate multiple objects in a single image with a single forward pass.
  • It works by dividing an image into a grid and predicting bounding boxes and class probabilities for each section of the grid.
  • YOLOv3 and YOLOv4 are widely used versions, but YOLOv5 is the most approachable for beginners due to its simplicity and open-source implementation.

Objective

  1. Use a pre-trained YOLO model to perform object detection.
  2. Instead of training YOLO from scratch (which requires substantial computational power), we’ll use a pre-trained model and run it on a sample image or video.

Steps to Implement YOLO Using a Tutorial Approach

We’ll use YOLOv3 or YOLOv5, depending on simplicity and available resources. The easiest way is to use a pre-trained model and perform inference on sample images. We’ll use the OpenCV library with a pre-trained YOLOv3 model.

Step 1: Set Up the Environment

First, let’s set up the environment. We will need:

  • OpenCV to load and display images.
  • YOLO weights and configuration files.
  • Python packages like NumPy for general processing.

You can install the required packages using:

pip install opencv-python-headless numpy

Step 2: Download YOLO Weights and Configuration

YOLOv3 uses two main files:

  • Weights file (“yolov3.weights”) contains the pre-trained parameters.
  • Configuration file (“yolov3.cfg”) describes the architecture of the YOLO model.

You can download them from the following sources:

Additionally, you’ll need the COCO dataset class names (“coco.names”) file, which contains the names of the 80 classes that YOLOv3 is trained to detect:

Step 3: Implementing YOLO for Object Detection

Using OpenCV and a pre-trained YOLOv3 model, we’ll perform object detection on an image.

Full Python Code

import cv2
import numpy as np

# Load YOLOv3 weights, configuration, and COCO class names
weights_path = "yolov3.weights"
config_path = "yolov3.cfg"
names_path = "coco.names"

# Load the COCO class names
with open(names_path, "r") as f:
    class_names = f.read().strip().split("\n")

# Load the YOLOv3 model
net = cv2.dnn.readNetFromDarknet(config_path, weights_path)

# Set the model to use CPU or CUDA if available (comment if only CPU is used)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

# Load the input image
image = cv2.imread("sample.jpg")  # Replace 'sample.jpg' with your image file path
(height, width) = image.shape[:2]

# Create a blob from the input image
blob = cv2.dnn.blobFromImage(image, scalefactor=1/255.0, size=(608, 608), swapRB=True, crop=False)
net.setInput(blob)

# Get the output layer names
layer_names = net.getLayerNames()
output_layer_names = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Perform forward pass to get output from the output layers
layer_outputs = net.forward(output_layer_names)

# Initialize lists to hold the bounding boxes, confidences, and class IDs
boxes = []
confidences = []
class_ids = []

# Loop over each output layer's detections
for output in layer_outputs:
    for detection in output:
        scores = detection[5:]  # Get the scores for all classes
        class_id = np.argmax(scores)  # Get the class ID with the highest score
        confidence = scores[class_id]  # Get the highest score (confidence)

        if confidence > 0.5:  # Filter out low confidence detections
            # Scale the bounding box back to the size of the image
            box = detection[0:4] * np.array([width, height, width, height])
            (centerX, centerY, box_width, box_height) = box.astype("int")

            # Get the top-left corner coordinates
            x = int(centerX - (box_width / 2))
            y = int(centerY - (box_height / 2))

            # Save the box, confidence, and class ID
            boxes.append([x, y, int(box_width), int(box_height)])
            confidences.append(float(confidence))
            class_ids.append(class_id)

# Apply Non-Maxima Suppression to suppress weak and overlapping bounding boxes
indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold=0.5, nms_threshold=0.4)

# Draw the bounding boxes and class labels on the image
for i in indices.flatten():
    (x, y) = (boxes[i][0], boxes[i][1])
    (w, h) = (boxes[i][2], boxes[i][3])

    color = (0, 255, 0)  # Green color for bounding box
    cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
    text = f"{class_names[class_ids[i]]}: {confidences[i]:.2f}"
    cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

# Show the output image
cv2.imshow("YOLO Object Detection", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Detailed Explanation of the Code

Load YOLO Configuration and Weights:

net = cv2.dnn.readNetFromDarknet(config_path, weights_path)
  • This loads the YOLO configuration and weights files, initializing the network and preparing it for inference.

Prepare the Input Image:

blob = cv2.dnn.blobFromImage(image, scalefactor=1/255.0, size=(608, 608), swapRB=True, crop=False)
net.setInput(blob)
  • “blobFromImage()” converts the image into a blob, the format that YOLO expects.
  • "scalefactor=1/255.0" normalizes pixel values to [0, 1].
  • The image is resized to (608, 608), which is the input size expected by YOLOv3.

Perform Forward Pass:

layer_outputs = net.forward(output_layer_names)
  • “forward()” performs a forward pass through the network, generating detections that include class scores and bounding boxes.

Extract Information from the Detections:

for output in layer_outputs:
    for detection in output:
        scores = detection[5:]  # Get the scores for all classes
        class_id = np.argmax(scores)  # Get the class ID with the highest score
        confidence = scores[class_id]  # Get the highest score (confidence)
  • This part loops over all detections to extract class IDs, confidences, and bounding boxes for each detected object. A confidence threshold is applied to filter out weak detections.

Non-Maxima Suppression (NMS):

indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold=0.5, nms_threshold=0.4)
  • NMS removes overlapping bounding boxes for the same object, keeping only the box with the highest confidence.

Draw Bounding Boxes:

cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

Bounding boxes and class labels are drawn on the image.

Running the Code

To run the code successfully:

  1. Place the following files in the same directory as your script:
    • “yolov3.weights”
    • “yolov3.cfg”
    • “coco.names”
    • An image file (e.g., “sample.jpg”) for testing.

After running the script, you should see a window displaying the image with detected objects and bounding boxes.

Yolo Object Detection Result

Video