Day 2: Classify Handwritten Digits Using a Simple NN on MNIST

The MNIST dataset is one of the most popular datasets for learning the basics of machine learning and neural networks. It contains handwritten digits (0-9), and our goal is to build a neural network that can classify these digits.

We’ll use a Simple Neural Network (i.e., a Feedforward Neural Network) for this task. Let’s break down the solution into easy-to-follow steps, and I’ll explain every part so you understand what’s happening. We will use Keras, a high-level API in TensorFlow, to make this as simple as possible.

Step-by-Step Solution Outline:

Load the MNIST Dataset.
Prepare and Preprocess the Data.
Build a Simple Feedforward Neural Network.
Compile and Train the Model.
Evaluate the Model Performance.
Make Predictions (optional step to visualize some predictions).

Here’s the full code along with detailed explanations:

Step-by-Step Implementation

Step 1: Import Libraries and Load Data

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
import matplotlib.pyplot as plt

numpy: Used for numerical operations.
tensorflow: We’ll use Keras, which is a part of TensorFlow, to create our neural network.
mnist: The MNIST dataset is built into Keras, which makes it easy to load.
Sequential and Dense: These help in building a neural network. Sequential is used for stacking layers.
matplotlib: Used for visualizing the digits from the dataset.

Step 2: Load the MNIST Dataset

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Check the shape of the data
print(f"Training data shape: {X_train.shape}, Training labels shape: {y_train.shape}")
print(f"Testing data shape: {X_test.shape}, Testing labels shape: {y_test.shape}")

mnist.load_data() loads the MNIST data and splits it into training and testing sets.
- X_train: Images of handwritten digits used for training the model.
- y_train: Corresponding labels for the training images (digits 0-9).
- X_test and y_test are the images and labels used for testing.
Shapes:
- X_train.shape: The shape is (60000, 28, 28), which means we have 60,000 images, each with a size of 28x28 pixels.
- y_train.shape: We have 60,000 labels corresponding to the training images.

Step 3: Preprocess the Data

# Normalize the data to range 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0

# Flatten the images from 28x28 to 784 (since a dense layer expects a vector input)
X_train = X_train.reshape(-1, 28 * 28)
X_test = X_test.reshape(-1, 28 * 28)

Normalize the Data:
- X_train / 255.0: The original pixel values are between 0 and 255. Dividing by 255 scales these values to between 0 and 1, which helps the neural network learn faster.
Flatten the Images:
- The MNIST images are 28x28 pixels, which we need to flatten into a vector of 784 pixels (28 * 28). This is because a fully connected (Dense) layer expects 1D vectors rather than 2D images.

Step 4: Build the Neural Network

# Build a simple feedforward neural network model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))  # First hidden layer with 128 neurons
model.add(Dense(64, activation='relu'))                       # Second hidden layer with 64 neurons
model.add(Dense(10, activation='softmax'))                    # Output layer with 10 neurons (for digits 0-9)

Sequential(): We create a Sequential model to stack layers one after another.
Dense(128, activation=‘relu’, input_shape=(784,)): 128 neurons in the first hidden layer with ReLU activation function.
- input_shape=(784,) tells the model that the input will be a vector of length 784 (flattened image).
- Dense(64, activation=‘relu’): Adds a second hidden layer with 64 neurons and ReLU activation.
Dense(10, activation=‘softmax’): The output layer has 10 neurons, each representing one of the digits 0-9.
- softmax activation is used to turn the output into probabilities, with the sum equal to 1.

Step 5: Compile the Model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

optimizer=‘adam’: The Adam optimizer is a popular choice for training deep learning models.
loss=‘sparse_categorical_crossentropy’: We use categorical cross-entropy since we have multiple classes (0-9) to predict. Sparse is used since our labels are integers (0-9) rather than one-hot encoded vectors.
metrics=[‘accuracy’]: We use accuracy to track the model’s performance during training and testing.

Step 6: Train the Model

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=32, verbose=1)

model.fit(): Train the model using the training data.
- validation_data=(X_test, y_test): Evaluate performance on the testing data during training.
- epochs=10: Train for 10 complete passes through the training dataset.
- batch_size=32: Update weights after every 32 samples.
- verbose=1: Print detailed information during training.

Step 7: Evaluate the Model

test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.2f}")

model.evaluate(X_test, y_test): Evaluates the model’s performance on the test data.
test_accuracy: This gives us an idea of how well the model can classify unseen handwritten digits.

Step 8: Visualize Training and Validation Performance (Optional)

# Convert history to DataFrame and plot accuracy and loss
history_df = pd.DataFrame(history.history)
history_df[['accuracy', 'val_accuracy']].plot()
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.show()

pd.DataFrame(history.history): Converts the training history to a DataFrame for easy visualization.
Plotting:
- Training and validation accuracy are plotted to see if the model’s performance improves over epochs and whether it overfits or underfits.

Step 9 (Optional): Make Some Predictions

# Make predictions on the first 5 test images
predictions = model.predict(X_test[:5])

# Display the first 5 images with predicted and true labels
for i in range(5):
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
    plt.title(f"Predicted: {np.argmax(predictions[i])}, True: {y_test.iloc[i]}")
    plt.axis('off')
    plt.show()

model.predict(X_test[:5]): Predicts the labels for the first 5 test images.
plt.imshow(): Displays each of the test images.
np.argmax(predictions[i]): Retrieves the predicted label for each image.
y_test.iloc[i]: Shows the true label.

Video

Coming Soon.