Day 15: Preparing Time Series Data for Temperature Forecasting

Welcome to Day 15 of our deep learning challenge! Today, we are preparing a time series dataset for training an RNN model for temperature forecasting. Below, I provide the complete code along with detailed explanations.

Full Code for Day 15: Preparing Time Series Data for Temperature Forecasting

Below is the code to prepare the time series dataset from Day 15, which will be used for training the RNN.

import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

# Load Jena Climate Dataset (or similar dataset for temperature prediction)
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv'
data = pd.read_csv(url)

# Convert the Date column to datetime and set it as the index
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)

# Plot the temperature over time
plt.figure(figsize=(10, 6))
plt.plot(data, label='Daily Min Temperature')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.title('Daily Minimum Temperature Over Time')
plt.legend()
plt.show()

# Create time series windows
def create_time_series_data(data, window_size, target_size=1):
    X, y = [], []
    for i in range(len(data) - window_size - target_size + 1):
        X.append(data[i: i + window_size])
        y.append(data[i + window_size: i + window_size + target_size])
    return np.array(X), np.array(y)

# Set window size for the time series
WINDOW_SIZE = 30  # The past 30 days as input

# Convert the temperature column into a numpy array
temp_data = data['Temp'].values

# Create time series windows
X, y = create_time_series_data(temp_data, WINDOW_SIZE)

# Split the data into training, validation, and test sets (70%, 20%, 10%)
train_size = int(len(X) * 0.7)
val_size = int(len(X) * 0.2)

X_train, y_train = X[:train_size], y[:train_size]
X_val, y_val = X[train_size:train_size + val_size], y[train_size:train_size + val_size]
X_test, y_test = X[train_size + val_size:], y[train_size + val_size:]

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))

# Fit the scaler on the training data and transform the training, validation, and test data
X_train = scaler.fit_transform(X_train.reshape(-1, 1)).reshape(X_train.shape)
X_val = scaler.transform(X_val.reshape(-1, 1)).reshape(X_val.shape)
X_test = scaler.transform(X_test.reshape(-1, 1)).reshape(X_test.shape)

# Targets (y) are also scaled to (0, 1) range
y_train = scaler.transform(y_train)
y_val = scaler.transform(y_val)
y_test = scaler.transform(y_test)

# Create TensorFlow datasets
BATCH_SIZE = 32
BUFFER_SIZE = 1000

# Create training dataset object
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_dataset = train_dataset.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

# Create validation dataset object
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, y_val))
val_dataset = val_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

# Create test dataset object
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

Explanation of Dataset Preparation Steps

  • Loading the Data: We load the Jena Climate dataset and convert the Date column to a datetime format, making it easier to work with time series data.
  • Create Sliding Windows: We create sequences of length WINDOW_SIZE (e.g., past 30 days) to be used as inputs to predict the target value (y), which is the next temperature value.
  • Splitting the Data: The dataset is split into training (70%), validation (20%), and testing (10%) to evaluate the model’s performance.
  • Normalization: We use the MinMaxScaler to normalize the data between 0 and 1 to help the RNN model converge faster.
  • Batching and Prefetching: We create TensorFlow datasets for training, validation, and testing. We also use shuffling, batching, and prefetching for efficient data handling during training.
    • Cache: The .cache() method is used to store data in memory after it’s loaded the first time, making training faster since it avoids reloading from disk.
    • Shuffle: .shuffle(BUFFER_SIZE) randomly shuffles the data, helping to prevent the model from learning any order-based biases from the data.
    • Batch: .batch(BATCH_SIZE) groups the data into batches, which helps to make training more efficient by updating the model parameters less frequently but with more data at each step.
    • Prefetch: .prefetch(tf.data.AUTOTUNE) allows TensorFlow to load the next batch while the current batch is being processed, speeding up training.

This completes our Day 15 project of preparing the time series dataset for temperature forecasting. The prepared data is now ready for training an RNN model, which we will work on in Day 16.

Video