Autonomous Driving - Car Detection

In this project, I am diving into object detection using the powerful YOLO model. The concepts I’ll be exploring are based on the two seminal YOLO papers: Redmon et al., 2016, and Redmon and Farhadi, 2016.

By the end of this journey, I aim to:

Detect objects in a car detection dataset
Implement non-max suppression to enhance accuracy
Apply intersection over union (IoU)
Manage bounding boxes, a common type of image annotation in deep learning

Important Guidelines for Code Development

To ensure my code is clean and functional, I will:

Avoid adding extra print statements.
Refrain from inserting additional code cells unnecessarily.
Keep the function parameters unchanged.
Use local variables instead of global ones unless explicitly necessary.
Prevent unnecessary changes to the code, such as creating extra variables.

Adhering to these guidelines will help avoid common errors and ensure the project runs smoothly.

Roadmap

Packages
- Begin by importing the necessary packages.
1 - Problem Statement
- Define the problem I’ll be addressing with YOLO.
2 - YOLO
- 2.1 - Model Details
  - Understand the specifics of the YOLO model.
- 2.2 - Filtering with a Threshold on Class Scores
  - Implement a function to filter boxes based on class scores.
  - yolo_filter_boxes
- 2.3 - Non-max Suppression
  - Learn how to perform non-max suppression to improve detection accuracy.
  - iou
- 2.4 - YOLO Non-max Suppression
  - Apply non-max suppression within the YOLO framework.
  - yolo_non_max_suppression
- 2.5 - Wrapping Up the Filtering
  - Finalize the filtering process.
3 - Testing YOLO on Images
- 3.1 - Defining Classes, Anchors, and Image Shape
  - Set up the classes, anchors, and image shape for testing.
- 3.2 - Loading a Pre-trained Model
  - Load a pre-trained YOLO model.
- 3.3 - Converting Model Output to Bounding Box Tensors
  - Transform the model’s output into usable bounding box tensors.
- 3.4 - Filtering Boxes
  - Filter the boxes to retain the most relevant ones.
- 3.5 - Running YOLO on an Image
  - Run the YOLO model on sample images to test its performance.
4 - Summary for YOLO
- Summarize the key takeaways and results from using YOLO.

1 - Problem Statement

You are working on a self-driving car. Go you! As a critical component of this project, you’d like to first build a car detection system. To collect data, you’ve mounted a camera to the hood (meaning the front) of the car, which takes pictures of the road ahead every few seconds as you drive around.

Pictures taken from a car-mounted camera while driving around Silicon Valley. Dataset provided by drive.ai.

You’ve gathered all these images into a folder and labeled them by drawing bounding boxes around every car you found. Here’s an example of what your bounding boxes look like:

2 - YOLO

“You Only Look Once” (YOLO) is a popular algorithm because it achieves high accuracy while also being able to run in real-time. This algorithm “only looks once” at the image in the sense that it requires only one forward propagation pass through the network to make predictions. After non-max suppression, it then outputs recognized objects together with the bounding boxes.

2.1 - Model Details

Inputs and Outputs

The input is a batch of images, with each image having the shape (608, 608, 3).
The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (pc, b_x, b_y, b_h, b_w , c). Expanding c into an 80-dimensional vector, each bounding box is represented by 85 numbers.

Anchor Boxes

Anchor boxes are chosen by exploring the training data to select reasonable height/width ratios that represent the different classes. For this project, 5 anchor boxes have been chosen (to cover the 80 classes) and are stored in the file ./model_data/yolo_anchors.txt.

The dimension of the encoding tensor for the second to last dimension, based on the anchor boxes, is (m, n_H, n_W, anchors, classes).

YOLO Architecture

The YOLO architecture follows this structure: - IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85)

2.2 - Filtering with a Threshold on Class Scores

The first step is to apply a filter by thresholding, which means removing any box for which the class “score” is less than a chosen threshold.

The model outputs a total of 19x19x5x85 numbers, with each box described by 85 numbers. To make it more manageable, I will rearrange the (19, 19, 5, 85) or (19, 19, 425) dimensional tensor into the following variables:

box_confidence: Tensor of shape (19, 19, 5, 1) containing p_c (confidence probability that there’s some object) for each of the 5 boxes predicted in each of the 19x19 cells.
boxes: Tensor of shape (19, 19, 5, 4) containing the midpoint and dimensions (b_x, b_y, b_h, b_w) for each of the 5 boxes in each cell.
box_class_probs: Tensor of shape (19, 19, 5, 80) containing the “class probabilities” (c₁, c₂, …, c₈₀) for each of the 80 classes for each of the 5 boxes per cell.

def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold = .6):
   
    box_scores = box_confidence * box_class_probs
    box_classes = tf.math.argmax(box_scores, axis=-1)
    box_class_scores = tf.math.reduce_max(box_scores, axis=-1)
    filtering_mask = (box_class_scores >= threshold)
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)

    return scores, boxes, classes

# Output
scores[2] = 9.270486
boxes[2] = [ 4.6399336  3.2303846  4.431282  -2.202031 ]
classes[2] = 8
scores.shape = (1789,)
boxes.shape = (1789, 4)
classes.shape = (1789,)