Day 13: Explore Image Segmentation With U-Net on Carvana Dataset

Day 13 brings a deep dive into image segmentation with U-Net, a powerful neural network architecture for segmentation tasks. The Carvana dataset is perfect for this as it involves segmentation of car images, which makes the task visually interesting and a great learning experience.

Let’s break down the task into easy-to-follow parts and help you implement it step by step!

Overview of Image Segmentation and U-Net

  • Image Segmentation: Image segmentation is the process of labeling every pixel in an image such that different parts of the image are identified. For instance, you might want to segment cars from the background so that each pixel belongs to either the car or the background.

  • U-Net: U-Net is a popular convolutional neural network (CNN) used for image segmentation. It is named “U-Net” due to its U-shaped architecture, consisting of:

    • Contracting Path: This is like an encoder, where features are extracted and the image resolution decreases.
    • Expanding Path: This is like a decoder, where the spatial resolution is gradually restored to produce a segmentation map.

The Carvana dataset is a dataset of car images that come with corresponding masks indicating which parts of the image contain the car, allowing you to train a model to segment cars from their backgrounds.

Steps to Implement U-Net for Car Segmentation

  1. Set Up Environment and Import Required Libraries
  2. Load and Preprocess the Dataset
  3. Build the U-Net Model
  4. Compile and Train the Model
  5. Evaluate the Model and Visualize Results

Let’s implement these steps.

Step 1: Set Up Environment and Import Libraries

First, make sure to install the required packages:

pip install tensorflow opencv-python matplotlib Pillow

Then, let’s import the required libraries.

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, concatenate, Input
from tensorflow.keras.models import Model

Explanation:

  • We use OpenCV for image processing, NumPy for numerical operations, Matplotlib for plotting, and TensorFlow/Keras for building the U-Net model.
  • PIL (Pillow) is used to handle image formats, especially for masks in GIF format.

Step 2: Load and Preprocess the Dataset

Here, we will load both the images and their corresponding masks from the dataset folders. We will resize them, normalize them, and make them ready for training.

# Set paths to the dataset folders
IMAGE_DIR = "dataset/carvana/train/"  # Path to the train folder
MASK_DIR = "dataset/carvana/train_masks/"  # Path to the train_masks folder

def load_data(image_dir, mask_dir, image_size=(128, 128)):
    images = []
    masks = []

    # Load images and masks
    for image_name in os.listdir(image_dir):
        # Skip hidden files or irrelevant files if any
        if image_name.startswith('.'):
            continue

        # Construct paths to image and corresponding mask
        img_path = os.path.join(image_dir, image_name)

        # Mask file has "_mask" appended before the extension
        base_name = image_name.replace(".jpg", "")
        mask_name = f"{base_name}_mask.gif"
        mask_path = os.path.join(mask_dir, mask_name)

        # Load the image using OpenCV
        img = cv2.imread(img_path)
        if img is None:
            print(f"Warning: Image {img_path} not found or couldn't be loaded.")
            continue
        img = cv2.resize(img, image_size) / 255.0  # Resize and normalize image to [0, 1]

        # Load the mask using PIL (Pillow)
        try:
            mask = Image.open(mask_path)
            mask = mask.convert('L')  # Convert to grayscale
            mask = np.array(mask)  # Convert to numpy array
            mask = cv2.resize(mask, image_size)  # Resize the mask to the same size as the input image
            mask = mask / 255.0  # Normalize to range [0, 1]
            mask = np.expand_dims(mask, axis=-1)  # Add channel dimension
        except Exception as e:
            print(f"Warning: Mask {mask_path} not found or couldn't be loaded. Error: {e}")
            continue

        images.append(img)
        masks.append(mask)

    return np.array(images), np.array(masks)

# Load the data
X, y = load_data(IMAGE_DIR, MASK_DIR)
print("Dataset loaded successfully.")
print(f"Images shape: {X.shape}, Masks shape: {y.shape}")

# Plot a few images and their corresponding masks
plt.figure(figsize=(12, 6))
for i in range(3):
    plt.subplot(2, 3, i + 1)
    plt.imshow(X[i])
    plt.title("Car Image")
    plt.axis('off')

    plt.subplot(2, 3, i + 4)
    plt.imshow(y[i].squeeze(), cmap='gray')
    plt.title("Mask")
    plt.axis('off')
plt.tight_layout()
plt.show()
Carvana Dataset Visualization
  • Training Images are in JPEG format, and masks are in GIF format.
  • Masks have filenames that append “_mask” before the file extension.

Explanation:

  • Loading Images: Uses OpenCV to read images, resizes them to 128x128, and normalizes them to [0, 1].
  • Loading Masks: Uses Pillow to load GIF masks, convert them to grayscale, resize, normalize, and add a channel dimension to make them compatible for training.

Step 3: Build the U-Net Model

The U-Net model has two main parts: a contracting path (encoder) and an expanding path (decoder).

def unet_model(input_size=(128, 128, 3)):
    inputs = Input(input_size)

    # Contracting Path (Encoder)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
    p1 = MaxPooling2D((2, 2))(c1)

    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(p1)
    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(c2)
    p2 = MaxPooling2D((2, 2))(c2)

    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(p2)
    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(c3)
    p3 = MaxPooling2D((2, 2))(c3)

    # Bottleneck
    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(p3)
    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(c4)

    # Expanding Path (Decoder)
    u5 = UpSampling2D((2, 2))(c4)
    u5 = concatenate([u5, c3])
    c5 = Conv2D(256, (3, 3), activation='relu', padding='same')(u5)
    c5 = Conv2D(256, (3, 3), activation='relu', padding='same')(c5)

    u6 = UpSampling2D((2, 2))(c5)
    u6 = concatenate([u6, c2])
    c6 = Conv2D(128, (3, 3), activation='relu', padding='same')(u6)
    c6 = Conv2D(128, (3, 3), activation='relu', padding='same')(c6)

    u7 = UpSampling2D((2, 2))(c6)
    u7 = concatenate([u7, c1])
    c7 = Conv2D(64, (3, 3), activation='relu', padding='same')(u7)
    c7 = Conv2D(64, (3, 3), activation='relu', padding='same')(c7)

    outputs = Conv2D(1, (1, 1), activation='sigmoid')(c7)

    model = Model(inputs, outputs)
    return model

# Instantiate the model
model = unet_model()
model.summary()

Explanation:

  • Contracting Path: Uses "Conv2D" and "MaxPooling2D" layers to extract features and downsample the image.
  • Expanding Path: Uses "UpSampling2D" and "Concatenate" to reconstruct the segmentation map at higher resolution.
  • outputs: Uses a "Conv2D" layer with a "1x1" kernel to produce the segmentation mask.

Step 4: Compile and Train the Model

We will use the Adam optimizer and binary cross-entropy loss since this is a binary segmentation task (car vs. background).

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X, y, validation_split=0.1, epochs=10, batch_size=8)

Explanation:

  • "optimizer='adam'": Efficient optimization method for training.
  • "loss='binary_crossentropy'": Suitable for binary segmentation.
  • "validation_split=0.1": Uses 10% of the data for validation during training.

Step 5: Evaluate the Model and Visualize Results

Finally, we’ll visualize how well the model has learned to segment the cars.

# Select a sample from validation data to predict
sample_image = X[0]
sample_mask = y[0]

# Expand dimensions to make it compatible with model input
sample_image_expanded = np.expand_dims(sample_image, axis=0)

# Predict mask
predicted_mask = model.predict(sample_image_expanded)[0]

# Plot original image, true mask, and predicted mask
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.title("Original Image")
plt.imshow(sample_image)

plt.subplot(1, 3, 2)
plt.title("True Mask")
plt.imshow(sample_mask.squeeze(), cmap='gray')

plt.subplot(1, 3, 3)
plt.title("Predicted Mask")
plt.imshow(predicted_mask.squeeze(), cmap='gray')

plt.show()
Original Image, True Mask and Predicted Mask Visualization

Explanation:

  • Prediction: Uses the trained model to predict the mask for a given image.
  • Visualization: Compares the original image, true mask, and predicted mask side-by-side for easy evaluation.

Video