Time Series & Recurrent Neural Networks

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Patrick Laub

Time Series

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Tabular data vs time series data

Tabular data

We have a dataset \{ \boldsymbol{x}_i, y_i \}_{i=1}^n which we assume are i.i.d. observations.

Brand Mileage # Claims
BMW 101 km 1
Audi 432 km 0
Volvo 3 km 5
\vdots \vdots \vdots

The goal is to predict the y for some covariates \boldsymbol{x}.

Time series data

Have a sequence \{ \boldsymbol{x}_t, y_t \}_{t=1}^T of observations taken at regular time intervals.

Date Humidity Temp.
Jan 1 60% 20 °C
Jan 2 65% 22 °C
Jan 3 70% 21 °C
\vdots \vdots \vdots

The task is to forecast future values based on the past.

Attributes of time series data

  • Temporal ordering: The order of the observations matters.
  • Trend: The general direction of the data.
  • Noise: Random fluctuations in the data.
  • Seasonality: Patterns that repeat at regular intervals.

Note

Question: What will be the temperature in Berlin tomorrow? What information would you use to make a prediction?

Australian financial stocks

stocks = pd.read_csv("aus_fin_stocks.csv")
stocks
Date ANZ ASX200 BOQ CBA NAB QBE SUN WBC
0 1981-01-02 1.588896 NaN NaN NaN 1.791642 NaN NaN 2.199454
1 1981-01-05 1.548452 NaN NaN NaN 1.791642 NaN NaN 2.163397
2 1981-01-06 1.600452 NaN NaN NaN 1.791642 NaN NaN 2.199454
... ... ... ... ... ... ... ... ... ...
10327 2021-10-28 28.600000 7430.4 8.97 106.86 29.450000 12.10 12.02 26.230000
10328 2021-10-29 28.140000 7323.7 8.80 104.68 28.710000 11.83 11.72 25.670000
10329 2021-11-01 27.900000 7357.4 8.79 105.71 28.565000 12.03 11.83 24.050000

10330 rows × 9 columns

Plot

stocks.plot()

Data types and NA values

stocks.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10330 entries, 0 to 10329
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    10330 non-null  object 
 1   ANZ     10319 non-null  float64
 2   ASX200  7452 non-null   float64
 3   BOQ     8970 non-null   float64
 4   CBA     7624 non-null   float64
 5   NAB     10316 non-null  float64
 6   QBE     9441 non-null   float64
 7   SUN     8424 non-null   float64
 8   WBC     10323 non-null  float64
dtypes: float64(8), object(1)
memory usage: 726.5+ KB
for col in stocks.columns:
    print(f"{col}: {stocks[col].isna().sum()}")
Date: 0
ANZ: 11
ASX200: 2878
BOQ: 1360
CBA: 2706
NAB: 14
QBE: 889
SUN: 1906
WBC: 7
asx200 = stocks.pop("ASX200")

Set the index to the date

stocks["Date"] = pd.to_datetime(stocks["Date"])
stocks = stocks.set_index("Date") # or `stocks.set_index("Date", inplace=True)`
stocks
ANZ BOQ CBA NAB QBE SUN WBC
Date
1981-01-02 1.588896 NaN NaN 1.791642 NaN NaN 2.199454
1981-01-05 1.548452 NaN NaN 1.791642 NaN NaN 2.163397
1981-01-06 1.600452 NaN NaN 1.791642 NaN NaN 2.199454
... ... ... ... ... ... ... ...
2021-10-28 28.600000 8.97 106.86 29.450000 12.10 12.02 26.230000
2021-10-29 28.140000 8.80 104.68 28.710000 11.83 11.72 25.670000
2021-11-01 27.900000 8.79 105.71 28.565000 12.03 11.83 24.050000

10330 rows × 7 columns

Plot II

stocks.plot()
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Can index using dates I

stocks.loc["2010-1-4":"2010-01-8"]
ANZ BOQ CBA NAB QBE SUN WBC
Date
2010-01-04 22.89 10.772147 54.573702 26.046571 25.21 8.142453 25.012620
2010-01-05 23.00 10.910369 55.399220 26.379283 25.34 8.264684 25.220235
2010-01-06 22.66 10.855080 55.677708 25.865956 24.95 8.086039 25.101598
2010-01-07 22.12 10.523346 55.140624 25.656823 24.50 8.198867 24.765460
2010-01-08 22.25 10.781361 55.856736 25.571269 24.77 8.245879 24.864324

Note, these ranges are inclusive, not like Python’s normal slicing.

Can index using dates II

So to get 2019’s December and all of 2020 for CBA:

stocks.loc["2019-12":"2020", ["CBA"]]
CBA
Date
2019-12-02 81.43
2019-12-03 79.34
2019-12-04 77.81
... ...
2020-12-29 84.01
2020-12-30 83.59
2020-12-31 82.11

275 rows × 1 columns

Can look at the first differences

stocks.diff().plot()
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Can look at the percentage changes

stocks.pct_change().plot()
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Focus on one stock

stock = stocks[["CBA"]]
stock
CBA
Date
1981-01-02 NaN
1981-01-05 NaN
1981-01-06 NaN
... ...
2021-10-28 106.86
2021-10-29 104.68
2021-11-01 105.71

10330 rows × 1 columns

stock.plot()

Find first non-missing value

first_day = stock.dropna().index[0]
first_day
Timestamp('1991-09-12 00:00:00')
stock = stock.loc[first_day:]
stock.isna().sum()
CBA    8
dtype: int64

Fill in the missing values

missing_day = stock[stock["CBA"].isna()].index[0]
prev_day = missing_day - pd.Timedelta(days=1)
after = missing_day + pd.Timedelta(days=3)
stock.loc[prev_day:after]
CBA
Date
2000-03-07 24.56662
2000-03-08 NaN
2000-03-09 NaN
2000-03-10 22.87580
stock = stock.ffill()
stock.loc[prev_day:after]
CBA
Date
2000-03-07 24.56662
2000-03-08 24.56662
2000-03-09 24.56662
2000-03-10 22.87580
stock.isna().sum()
CBA    0
dtype: int64

Baseline forecasts

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Persistence forecast

The simplest model is to predict the next value to be the same as the current value.

stock.loc["2019":, "Persistence"] = stock.loc["2018"].iloc[-1].values[0]
stock.loc["2018-12":"2019"].plot()
plt.axvline("2019", color="black", linestyle="--")

Trend

We can extrapolate from recent trend:

past_date = stock.loc["2018"].index[-30]
past = stock.loc[past_date, "CBA"]
latest_date = stock.loc["2018", "CBA"].index[-1]
latest = stock.loc[latest_date, "CBA"]

trend = (latest - past) / (latest_date - past_date).days
print(trend)

tdays_since_cutoff = np.arange(1, len(stock.loc["2019":]) + 1)
stock.loc["2019":, "Trend"] = latest + trend * tdays_since_cutoff
0.07755555555555545

Trend forecasts

stock.loc["2018-12":"2019"].plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(ncol=3, loc="upper center", bbox_to_anchor=(0.5, 1.3))

Which is better?

If we look at the mean squared error (MSE) of the two models:

persistence_mse = mean_squared_error(stock.loc["2019", "CBA"], stock.loc["2019", "Persistence"])
trend_mse = mean_squared_error(stock.loc["2019", "CBA"], stock.loc["2019", "Trend"])
persistence_mse, trend_mse
(39.54629367588932, 37.87104674064297)

Use the history

cba_shifted = stock["CBA"].head().shift(1)
both = pd.concat([stock["CBA"].head(), cba_shifted], axis=1, keys=["Today", "Yesterday"])
both
Today Yesterday
Date
1991-09-12 6.425116 NaN
1991-09-13 6.365440 6.425116
1991-09-16 6.305764 6.365440
1991-09-17 6.285872 6.305764
1991-09-18 6.325656 6.285872
def lagged_timeseries(df, target, window=30):
    lagged = pd.DataFrame()
    for i in range(window, 0, -1):
        lagged[f"T-{i}"] = df[target].shift(i)
    lagged["T"] = df[target].values
    return lagged

Lagged time series

df_lags = lagged_timeseries(stock, "CBA", 40)
df_lags
T-40 T-39 T-38 T-37 T-36 T-35 T-34 T-33 T-32 T-31 ... T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1 T
Date
1991-09-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.425116
1991-09-13 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 6.425116 6.365440
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-10-29 101.84 102.16 102.14 102.92 100.55 101.09 101.30 101.58 101.41 102.85 ... 103.94 103.89 105.03 104.95 104.88 105.46 105.1 106.10 106.860000 104.680000
2021-11-01 102.16 102.14 102.92 100.55 101.09 101.30 101.58 101.41 102.85 102.88 ... 103.89 105.03 104.95 104.88 105.46 105.10 106.1 106.86 104.680000 105.710000

7632 rows × 41 columns

Split into training and testing

# Split the data in time
X_train = df_lags.loc[:"2018"]
X_val = df_lags.loc["2019"]
X_test = df_lags.loc["2020":]

# Remove any with NAs and split into X and y
X_train = X_train.dropna()
X_val = X_val.dropna()
X_test = X_test.dropna()

y_train = X_train.pop("T")
y_val = X_val.pop("T")
y_test = X_test.pop("T")
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
((6872, 40), (6872,), (253, 40), (253,), (467, 40), (467,))

Inspect the split data

X_train
T-40 T-39 T-38 T-37 T-36 T-35 T-34 T-33 T-32 T-31 ... T-10 T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1
Date
1991-11-07 6.425116 6.365440 6.305764 6.285872 6.325656 6.385332 6.445008 6.445008 6.504684 6.564360 ... 7.280472 7.260580 7.190958 7.240688 7.379932 7.459500 7.320256 7.360040 7.459500 7.379932
1991-11-08 6.365440 6.305764 6.285872 6.325656 6.385332 6.445008 6.445008 6.504684 6.564360 6.624036 ... 7.260580 7.190958 7.240688 7.379932 7.459500 7.320256 7.360040 7.459500 7.379932 7.379932
1991-11-11 6.305764 6.285872 6.325656 6.385332 6.445008 6.445008 6.504684 6.564360 6.624036 6.663820 ... 7.190958 7.240688 7.379932 7.459500 7.320256 7.360040 7.459500 7.379932 7.379932 7.449554
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-12-27 68.160000 69.230000 68.940000 68.350000 67.980000 68.950000 69.350000 70.620000 70.950000 71.770000 ... 68.430000 70.080000 70.180000 68.810000 69.260000 68.600000 69.400000 68.920000 68.480000 68.650000
2018-12-28 69.230000 68.940000 68.350000 67.980000 68.950000 69.350000 70.620000 70.950000 71.770000 70.900000 ... 70.080000 70.180000 68.810000 69.260000 68.600000 69.400000 68.920000 68.480000 68.650000 70.330000
2018-12-31 68.940000 68.350000 67.980000 68.950000 69.350000 70.620000 70.950000 71.770000 70.900000 69.210000 ... 70.180000 68.810000 69.260000 68.600000 69.400000 68.920000 68.480000 68.650000 70.330000 71.910000

6872 rows × 40 columns

Plot the split

Code
y_train.plot()
y_val.plot()
y_test.plot()
plt.legend(["Train", "Validation", "Test"]);

Train on more recent data

X_train = X_train.loc["2012":]
y_train = y_train.loc["2012":]
Code
y_train.plot()
y_val.plot()
y_test.plot()
plt.legend(["Train", "Validation", "Test"], loc="center left", bbox_to_anchor=(1, 0.5));

Rescale by eyeballing it

X_train = X_train / 100
X_val = X_val / 100
X_test = X_test / 100
y_train = y_train / 100
y_val = y_val / 100
y_test = y_test / 100
Code
y_train.plot()
y_val.plot()
y_test.plot()
plt.legend(["Train", "Validation", "Test"], loc="center left", bbox_to_anchor=(1, 0.5));

Fit a linear model

lr = LinearRegression()
lr.fit(X_train, y_train);

Make a forecast for the validation data:

y_pred = lr.predict(X_val)
stock.loc[X_val.index, "Linear"] = y_pred
Code
stock.loc["2018-12":"2019"].plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Inverse-transform the forecasts

stock.loc[X_val.index, "Linear"] = 100 * y_pred
Code
stock.loc["2018-12":"2019"].plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Careful with the metrics

mean_squared_error(y_val, y_pred)
6.329105517812197e-05
mean_squared_error(100 * y_val, 100 * y_pred)
0.6329105517812198
100**2 * mean_squared_error(y_val, y_pred)
0.6329105517812197
linear_mse = 100**2 * mean_squared_error(y_val, y_pred)
persistence_mse, trend_mse, linear_mse
(39.54629367588932, 37.87104674064297, 0.6329105517812197)

Multi-step forecasts

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Comparing apples to apples

The linear model is only producing one-step-ahead forecasts.

The other models are producing multi-step-ahead forecasts.

stock.loc["2019":, "Shifted"] = stock["CBA"].shift(1).loc["2019":]
Code
stock.loc["2018-12":"2019"].plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));
shifted_mse = mean_squared_error(stock.loc["2019", "CBA"], stock.loc["2019", "Shifted"])
persistence_mse, trend_mse, linear_mse, shifted_mse
(39.54629367588932, 37.87104674064297, 0.6329105517812197, 0.6367221343873524)

Autoregressive forecasts

The linear model needs the last 90 days to make a forecast.

Idea: Make the first forecast, then use that to make the next forecast, and so on.

\begin{aligned} \hat{y}_t &= \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \ldots + \beta_n y_{t-n} \\ \hat{y}_{t+1} &= \beta_0 + \beta_1 \hat{y}_t + \beta_2 y_{t-1} + \ldots + \beta_n y_{t-n+1} \\ \hat{y}_{t+2} &= \beta_0 + \beta_1 \hat{y}_{t+1} + \beta_2 \hat{y}_t + \ldots + \beta_n y_{t-n+2} \end{aligned} \vdots \hat{y}_{t+k} = \beta_0 + \beta_1 \hat{y}_{t+k-1} + \beta_2 \hat{y}_{t+k-2} + \ldots + \beta_n \hat{y}_{t+k-n}

Autoregressive forecasting function

def autoregressive_forecast(model, X_val, suppress=False):
    """
    Generate a multi-step forecast using the given model.
    """
    multi_step = pd.Series(index=X_val.index, name="Multi Step")

    # Initialize the input data for forecasting
    input_data = X_val.iloc[0].values.reshape(1, -1)

    for i in range(len(multi_step)):
        # Ensure input_data has the correct feature names
        input_df = pd.DataFrame(input_data, columns=X_val.columns)
        if suppress:
            next_value = model.predict(input_df, verbose=0)
        else:
            next_value = model.predict(input_df) 

        multi_step.iloc[i] = next_value

        # Append that prediction to the input for the next forecast
        if i + 1 < len(multi_step):
            input_data = np.append(input_data[:, 1:], next_value).reshape(1, -1)

    return multi_step

Look at the autoregressive linear forecasts

lr_forecast = autoregressive_forecast(lr, X_val)
stock.loc[lr_forecast.index, "MS Linear"] = 100 * lr_forecast
stock.loc["2018-12":"2019"].drop(["Linear", "Shifted"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts:

linear_mse, shifted_mse
(0.6329105517812197, 0.6367221343873524)

Multi-step-ahead forecasts:

multi_step_linear_mse = 100**2 * mean_squared_error(y_val, lr_forecast)
persistence_mse, trend_mse, multi_step_linear_mse
(39.54629367588932, 37.87104674064297, 23.847003791127374)

Prefer only short windows

stock.loc["2019":"2019-1"].drop(["Linear", "Shifted"], axis=1).plot();
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

“It’s tough to make predictions, especially about the future.”

Neural network forecasts

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Simple feedforward neural network

model = Sequential([
        Dense(64, activation="leaky_relu"),
        Dense(1, "softplus")])

model.compile(optimizer="adam", loss="mean_squared_error")
if Path("aus_fin_fnn_model.h5").exists():
    model = keras.models.load_model("aus_fin_fnn_model.h5")
else:
    es = EarlyStopping(patience=15, restore_best_weights=True)
    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=500,
        callbacks=[es], verbose=0)
    model.save("aus_fin_fnn_model.h5")

model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (32, 64)               │         2,624 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (32, 1)                │            65 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 2,691 (10.51 KB)
 Trainable params: 2,689 (10.50 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 2 (8.00 B)

Forecast and plot

y_pred = model.predict(X_val, verbose=0)
stock.loc[X_val.index, "FNN"] = 100 * y_pred
stock.loc["2018-12":"2019"].drop(["Persistence", "Trend", "MS Linear"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Autoregressive forecasts

nn_forecast = autoregressive_forecast(model, X_val, True)
stock.loc[nn_forecast.index, "MS FNN"] = 100 * nn_forecast
stock.loc["2018-12":"2019"].drop(["Linear", "Shifted", "FNN"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts:

nn_mse = 100**2 * mean_squared_error(y_val, y_pred)
linear_mse, shifted_mse, nn_mse
(0.6329105517812197, 0.6367221343873524, 1.0445115378023873)

Multi-step-ahead forecasts:

multi_step_fnn_mse = 100**2 * mean_squared_error(y_val, nn_forecast)
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse
(39.54629367588932, 37.87104674064297, 23.847003791127374, 10.150573162371526)

Recurrent Neural Networks

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Basic facts of RNNs

  • A recurrent neural network is a type of neural network that is designed to process sequences of data (e.g. time series, sentences).
  • A recurrent neural network is any network that contains a recurrent layer.
  • A recurrent layer is a layer that processes data in a sequence.
  • An RNN can have one or more recurrent layers.
  • Weights are shared over time; this allows the model to be used on arbitrary-length sequences.

Applications

  • Forecasting: revenue forecast, weather forecast, predict disease rate from medical history, etc.
  • Classification: given a time series of the activities of a visitor on a website, classify whether the visitor is a bot or a human.
  • Event detection: given a continuous data stream, identify the occurrence of a specific event. Example: Detect utterances like “Hey Alexa” from an audio stream.
  • Anomaly detection: given a continuous data stream, detect anything unusual happening. Example: Detect unusual activity on the corporate network.

Origin of the name of RNNs

A recurrence relation is an equation that expresses each element of a sequence as a function of the preceding ones. More precisely, in the case where only the immediately preceding element is involved, a recurrence relation has the form

u_n = \psi(n, u_{n-1}) \quad \text{ for } \quad n > 0.

Example: Factorial n! = n (n-1)! for n > 0 given 0! = 1.

Diagram of an RNN cell

The RNN processes each data in the sequence one by one, while keeping memory of what came before.

Schematic of a recurrent neural network. E.g. SimpleRNN, LSTM, or GRU.

A SimpleRNN cell

Diagram of a SimpleRNN cell.

All the outputs before the final one are often discarded.

LSTM internals

Diagram of an LSTM cell. Notation for the diagram.

GRU internals

Diagram of a GRU cell.

Stock prediction with recurrent networks

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

SimpleRNN

from keras.layers import SimpleRNN, Reshape
model = Sequential([
        Reshape((-1, 1)),
        SimpleRNN(64, activation="tanh"),
        Dense(1, "softplus")])
model.compile(optimizer="adam", loss="mean_squared_error")
es = EarlyStopping(patience=15, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_val, y_val),
    epochs=500, callbacks=[es], verbose=0)
model.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ reshape (Reshape)               │ (32, 40, 1)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ simple_rnn (SimpleRNN)          │ (32, 64)               │         4,224 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (32, 1)                │            65 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 4,291 (16.76 KB)
 Trainable params: 4,289 (16.75 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 2 (8.00 B)

Forecast and plot

y_pred = model.predict(X_val.to_numpy(), verbose=0)
stock.loc[X_val.index, "SimpleRNN"] = 100 * y_pred
Code
stock.loc["2018-12":"2019"].drop(["Persistence", "Trend", "MS Linear", "MS FNN"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Multi-step forecasts

rnn_forecast = autoregressive_forecast(model, X_val, True)
stock.loc[rnn_forecast.index, "MS RNN"] = 100 * rnn_forecast
Code
stock.loc["2018-12":"2019"].drop(["Linear", "Shifted", "FNN", "SimpleRNN"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts:

rnn_mse = 100**2 * mean_squared_error(y_val, y_pred)
linear_mse, shifted_mse, nn_mse, rnn_mse
(0.6329105517812197,
 0.6367221343873524,
 1.0445115378023873,
 0.6444506647025611)

Multi-step-ahead forecasts:

multi_step_rnn_mse = 100**2 * mean_squared_error(y_val, rnn_forecast)
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse, multi_step_rnn_mse
(39.54629367588932,
 37.87104674064297,
 23.847003791127374,
 10.150573162371526,
 10.58367263283111)

GRU

from keras.layers import GRU

model = Sequential([Reshape((-1, 1)),
        GRU(16, activation="tanh"),
        Dense(1, "softplus")])
model.compile(optimizer="adam", loss="mean_squared_error")
es = EarlyStopping(patience=15, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_val, y_val),
    epochs=500, callbacks=[es], verbose=0)
model.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ reshape_1 (Reshape)             │ (32, 40, 1)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ gru (GRU)                       │ (32, 16)               │           912 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (32, 1)                │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 931 (3.64 KB)
 Trainable params: 929 (3.63 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 2 (8.00 B)

Forecast and plot

y_pred = model.predict(X_val, verbose=0)
stock.loc[X_val.index, "GRU"] = 100 * y_pred
Code
stock.loc["2018-12":"2019"].drop(["Persistence", "Trend", "MS Linear", "MS FNN", "MS RNN"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Multi-step forecasts

gru_forecast = autoregressive_forecast(model, X_val, True)
stock.loc[gru_forecast.index, "MS GRU"] = 100 * gru_forecast
Code
stock.loc["2018-12":"2019"].drop(["Linear", "Shifted", "FNN", "SimpleRNN", "GRU"], axis=1).plot()
plt.axvline("2019", color="black", linestyle="--")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts:

gru_mse = 100**2 * mean_squared_error(y_val, y_pred)
linear_mse, shifted_mse, nn_mse, rnn_mse, gru_mse
(0.6329105517812197,
 0.6367221343873524,
 1.0445115378023873,
 0.6444506647025611,
 0.6390276531968386)

Multi-step-ahead forecasts:

multi_step_gru_mse = 100**2 * mean_squared_error(y_val, gru_forecast)
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse, multi_step_rnn_mse, multi_step_gru_mse
(39.54629367588932,
 37.87104674064297,
 23.847003791127374,
 10.150573162371526,
 10.58367263283111,
 8.111302768865865)

Internals of the SimpleRNN

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

The rank of a time series

Say we had n observations of a time series x_1, x_2, \dots, x_n.

This \boldsymbol{x} = (x_1, \dots, x_n) would have shape (n,) & rank 1.

If instead we had a batch of b time series’

\boldsymbol{X} = \begin{pmatrix} x_7 & x_8 & \dots & x_{7+n-1} \\ x_2 & x_3 & \dots & x_{2+n-1} \\ \vdots & \vdots & \ddots & \vdots \\ x_3 & x_4 & \dots & x_{3+n-1} \\ \end{pmatrix} \,,

the batch \boldsymbol{X} would have shape (b, n) & rank 2.

Multivariate time series

t x y
0 x_0 y_0
1 x_1 y_1
2 x_2 y_2
3 x_3 y_3

Say n observations of the m time series, would be a shape (n, m) matrix of rank 2.

In Keras, a batch of b of these time series has shape (b, n, m) and has rank 3.

Note

Use \boldsymbol{x}_t \in \mathbb{R}^{1 \times m} to denote the vector of all time series at time t. Here, \boldsymbol{x}_t = (x_t, y_t).

SimpleRNN

Say each prediction is a vector of size d, so \boldsymbol{y}_t \in \mathbb{R}^{1 \times d}.

Then the main equation of a SimpleRNN, given \boldsymbol{y}_0 = \boldsymbol{0}, is

\boldsymbol{y}_t = \psi\bigl( \boldsymbol{x}_t \boldsymbol{W}_x + \boldsymbol{y}_{t-1} \boldsymbol{W}_y + \boldsymbol{b} \bigr) .

Here, \begin{aligned} &\boldsymbol{x}_t \in \mathbb{R}^{1 \times m}, \boldsymbol{W}_x \in \mathbb{R}^{m \times d}, \\ &\boldsymbol{y}_{t-1} \in \mathbb{R}^{1 \times d}, \boldsymbol{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \boldsymbol{b} \in \mathbb{R}^{d}. \end{aligned}

SimpleRNN (in batches)

Say we operate on batches of size b, then \boldsymbol{Y}_t \in \mathbb{R}^{b \times d}.

The main equation of a SimpleRNN, given \boldsymbol{Y}_0 = \boldsymbol{0}, is \boldsymbol{Y}_t = \psi\bigl( \boldsymbol{X}_t \boldsymbol{W}_x + \boldsymbol{Y}_{t-1} \boldsymbol{W}_y + \boldsymbol{b} \bigr) . Here, \begin{aligned} &\boldsymbol{X}_t \in \mathbb{R}^{b \times m}, \boldsymbol{W}_x \in \mathbb{R}^{m \times d}, \\ &\boldsymbol{Y}_{t-1} \in \mathbb{R}^{b \times d}, \boldsymbol{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \boldsymbol{b} \in \mathbb{R}^{d}. \end{aligned}

Simple Keras demo

num_obs = 4
num_time_steps = 3
num_time_series = 2

X = (
    np.arange(num_obs * num_time_steps * num_time_series)
    .astype(np.float32)
    .reshape([num_obs, num_time_steps, num_time_series])
)

output_size = 1
y = np.array([0, 0, 1, 1])
X[:2]
array([[[ 0.,  1.],
        [ 2.,  3.],
        [ 4.,  5.]],

       [[ 6.,  7.],
        [ 8.,  9.],
        [10., 11.]]], dtype=float32)
X[2:]
array([[[12., 13.],
        [14., 15.],
        [16., 17.]],

       [[18., 19.],
        [20., 21.],
        [22., 23.]]], dtype=float32)

Keras’ SimpleRNN

As usual, the SimpleRNN is just a layer in Keras.

from keras.layers import SimpleRNN

random.seed(1234)
model = Sequential([SimpleRNN(output_size, activation="sigmoid")])
model.compile(loss="binary_crossentropy", metrics=["accuracy"])

hist = model.fit(X, y, epochs=500, verbose=False)
model.evaluate(X, y, verbose=False)
[8.059103012084961, 0.5]

The predicted probabilities on the training set are:

model.predict(X, verbose=0)
array([[2.19e-04],
       [2.79e-09],
       [3.52e-14],
       [4.45e-19]], dtype=float32)

SimpleRNN weights

model.get_weights()
[array([[-1.31],
        [-0.57]], dtype=float32),
 array([[-1.03]], dtype=float32),
 array([-0.32], dtype=float32)]
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


W_x, W_y, b = model.get_weights()

Y = np.zeros((num_obs, output_size), dtype=np.float32)
for t in range(num_time_steps):
    X_t = X[:, t, :]
    z = X_t @ W_x + Y @ W_y + b
    Y = sigmoid(z)

Y
array([[2.19e-04],
       [2.79e-09],
       [3.52e-14],
       [4.45e-19]], dtype=float32)

Other recurrent network variants

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Input and output sequences

Categories of recurrent neural networks: sequence to sequence, sequence to vector, vector to sequence, encoder-decoder network.

Input and output sequences

  • Sequence to sequence: Useful for predicting time series such as using prices over the last N days to output the prices shifted one day into the future (i.e. from N-1 days ago to tomorrow.)
  • Sequence to vector: ignore all outputs in the previous time steps except for the last one. Example: give a sentiment score to a sequence of words corresponding to a movie review.

Input and output sequences

  • Vector to sequence: feed the network the same input vector over and over at each time step and let it output a sequence. Example: given that the input is an image, find a caption for it. The image is treated as an input vector (pixels in an image do not follow a sequence). The caption is a sequence of textual description of the image. A dataset containing images and their descriptions is the input of the RNN.
  • The Encoder-Decoder: The encoder is a sequence-to-vector network. The decoder is a vector-to-sequence network. Example: Feed the network a sequence in one language. Use the encoder to convert the sentence into a single vector representation. The decoder decodes this vector into the translation of the sentence in another language.

Recurrent layers can be stacked.

Deep RNN unrolled through time.

CoreLogic Hedonic Home Value Index

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Australian House Price Indices

Note

I apologise in advance for not being able to share this dataset with anyone (it is not mine to share).

Percentage changes

changes = house_prices.pct_change().dropna()
changes.round(2)
Brisbane East_Bris North_Bris West_Bris Melbourne North_Syd Sydney
Date
1990-02-28 0.03 -0.01 0.01 0.01 0.00 -0.00 -0.02
1990-03-31 0.01 0.03 0.01 0.01 0.02 -0.00 0.03
1990-04-30 0.02 0.02 0.01 -0.00 0.01 0.03 0.04
... ... ... ... ... ... ... ...
2021-03-31 0.04 0.04 0.03 0.04 0.02 0.05 0.05
2021-04-30 0.03 0.01 0.01 -0.00 0.01 0.02 0.02
2021-05-31 0.03 0.03 0.03 0.03 0.03 0.02 0.04

376 rows × 7 columns

Percentage changes

changes.plot();

The size of the changes

changes.mean()
Brisbane      0.005496
East_Bris     0.005416
North_Bris    0.005024
West_Bris     0.004842
Melbourne     0.005677
North_Syd     0.004819
Sydney        0.005526
dtype: float64
changes *= 100
changes.mean()
Brisbane      0.549605
East_Bris     0.541562
North_Bris    0.502390
West_Bris     0.484204
Melbourne     0.567700
North_Syd     0.481863
Sydney        0.552641
dtype: float64
changes.plot(legend=False);

Split without shuffling

num_train = int(0.6 * len(changes))
num_val = int(0.2 * len(changes))
num_test = len(changes) - num_train - num_val
print(f"# Train: {num_train}, # Val: {num_val}, # Test: {num_test}")
# Train: 225, # Val: 75, # Test: 76

Subsequences of a time series

Keras has a built-in method for converting a time series into subsequences/chunks.

from keras.utils import timeseries_dataset_from_array

integers = range(10)
dummy_dataset = timeseries_dataset_from_array(
    data=integers[:-3],
    targets=integers[3:],
    sequence_length=3,
    batch_size=2,
)

for inputs, targets in dummy_dataset:
    for i in range(inputs.shape[0]):
        print([int(x) for x in inputs[i]], int(targets[i]))
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7

Predicting Sydney House Prices

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Creating dataset objects

# Num. of input time series.
num_ts = changes.shape[1]

# How many prev. months to use.
seq_length = 6

# Predict the next month ahead.
ahead = 1

# The index of the first target.
delay = seq_length + ahead - 1
# Which suburb to predict.
target_suburb = changes["Sydney"]

train_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    end_index=num_train,
)
val_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    start_index=num_train,
    end_index=num_train + num_val,
)
test_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    start_index=num_train + num_val,
)

Converting Dataset to numpy

The Dataset object can be handed to Keras directly, but if we really need a numpy array, we can run:

X_train = np.concatenate(list(train_ds.map(lambda x, y: x)))
y_train = np.concatenate(list(train_ds.map(lambda x, y: y)))

The shape of our training set is now:

X_train.shape
(220, 6, 7)
y_train.shape
(220,)

Converting the rest to numpy arrays:

X_val = np.concatenate(list(val_ds.map(lambda x, y: x)))
y_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: x)))
y_test = np.concatenate(list(test_ds.map(lambda x, y: y)))

A dense network

from keras.layers import Input, Flatten
random.seed(1)
model_dense = Sequential([
    Input((seq_length, num_ts)),
    Flatten(),
    Dense(50, activation="leaky_relu"),
    Dense(20, activation="leaky_relu"),
    Dense(1, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3191 parameters.
Epoch 52: early stopping
Restoring model weights from the end of the best epoch: 2.
CPU times: user 908 ms, sys: 13 ms, total: 921 ms
Wall time: 950 ms

Plot the model

from keras.utils import plot_model

plot_model(model_dense, show_shapes=True)

Assess the fits

model_dense.evaluate(X_val, y_val, verbose=0)
1.043065071105957
Code
y_pred = model_dense.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A SimpleRNN layer

random.seed(1)

model_simple = Sequential([
    Input((seq_length, num_ts)),
    SimpleRNN(50),
    Dense(1, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 2951 parameters.
Epoch 54: early stopping
Restoring model weights from the end of the best epoch: 4.
CPU times: user 1.75 s, sys: 7.11 ms, total: 1.76 s
Wall time: 1.8 s

Plot the model

plot_model(model_simple, show_shapes=True)

Assess the fits

model_simple.evaluate(X_val, y_val, verbose=0)
0.9619883894920349
Code
y_pred = model_simple.predict(X_val, verbose=0)

plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A LSTM layer

from keras.layers import LSTM

random.seed(1)

model_lstm = Sequential([
    Input((seq_length, num_ts)),
    LSTM(50),
    Dense(1, activation="linear")
])

model_lstm.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 62: early stopping
Restoring model weights from the end of the best epoch: 12.
CPU times: user 2.59 s, sys: 22.9 ms, total: 2.61 s
Wall time: 2.66 s

Assess the fits

model_lstm.evaluate(X_val, y_val, verbose=0)
0.8037604093551636
Code
y_pred = model_lstm.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A GRU layer

from keras.layers import GRU

random.seed(1)

model_gru = Sequential([
    Input((seq_length, num_ts)),
    GRU(50),
    Dense(1, activation="linear")
])

model_gru.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 61: early stopping
Restoring model weights from the end of the best epoch: 11.
CPU times: user 3 s, sys: 15.7 ms, total: 3.02 s
Wall time: 3.07 s

Assess the fits

model_gru.evaluate(X_val, y_val, verbose=0)
0.7643826007843018
Code
y_pred = model_gru.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Two GRU layers

random.seed(1)

model_two_grus = Sequential([
    Input((seq_length, num_ts)),
    GRU(50, return_sequences=True),
    GRU(50),
    Dense(1, activation="linear")
])

model_two_grus.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 55: early stopping
Restoring model weights from the end of the best epoch: 5.
CPU times: user 4.55 s, sys: 13.1 ms, total: 4.56 s
Wall time: 4.6 s

Assess the fits

model_two_grus.evaluate(X_val, y_val, verbose=0)
0.7825747728347778
Code
y_pred = model_two_grus.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Compare the models

Model MSE
0 Dense 1.043065
1 SimpleRNN 0.961988
2 LSTM 0.803760
4 2 GRUs 0.782575
3 GRU 0.764383

The network with two GRU layers is the best.

model_two_grus.evaluate(test_ds, verbose=0)
2.023635149002075

Test set

Code
y_pred = model_two_grus.predict(test_ds, verbose=0)
plt.plot(y_test, label="Sydney")
plt.plot(y_pred, label="2 GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Predicting Multiple Time Series

Lecture Outline

  • Time Series

  • Baseline forecasts

  • Multi-step forecasts

  • Neural network forecasts

  • Recurrent Neural Networks

  • Stock prediction with recurrent networks

  • Internals of the SimpleRNN

  • Other recurrent network variants

  • CoreLogic Hedonic Home Value Index

  • Predicting Sydney House Prices

  • Predicting Multiple Time Series

Creating dataset objects

Change the targets argument to include all the suburbs.

train_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    end_index=num_train,
)
val_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    start_index=num_train,
    end_index=num_train + num_val,
)
test_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    start_index=num_train + num_val,
)

Converting Dataset to numpy

The shape of our training set is now:

X_train = np.concatenate(list(train_ds.map(lambda x, y: x)))
X_train.shape
(220, 6, 7)
y_train = np.concatenate(list(train_ds.map(lambda x, y: y)))
y_train.shape
(220, 7)

Converting the rest to numpy arrays:

X_val = np.concatenate(list(val_ds.map(lambda x, y: x)))
y_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: x)))
y_test = np.concatenate(list(test_ds.map(lambda x, y: y)))

A dense network

random.seed(1)
model_dense = Sequential([
    Input((seq_length, num_ts)),
    Flatten(),
    Dense(50, activation="leaky_relu"),
    Dense(20, activation="leaky_relu"),
    Dense(num_ts, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3317 parameters.
Epoch 69: early stopping
Restoring model weights from the end of the best epoch: 19.
CPU times: user 1.2 s, sys: 5.49 ms, total: 1.21 s
Wall time: 1.25 s

Plot the model

plot_model(model_dense, show_shapes=True)

Assess the fits

model_dense.evaluate(X_val, y_val, verbose=0)
1.5469738245010376
Code
Y_pred = model_dense.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A SimpleRNN layer

random.seed(1)

model_simple = Sequential([
    Input((seq_length, num_ts)),
    SimpleRNN(50),
    Dense(num_ts, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3257 parameters.
Epoch 62: early stopping
Restoring model weights from the end of the best epoch: 12.
CPU times: user 2.03 s, sys: 8.96 ms, total: 2.04 s
Wall time: 2.08 s

Plot the model

plot_model(model_simple, show_shapes=True)

Assess the fits

model_simple.evaluate(X_val, y_val, verbose=0)
1.473482370376587
Code
Y_pred = model_simple.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A LSTM layer

random.seed(1)

model_lstm = Sequential([
    Input((seq_length, num_ts)),
    LSTM(50),
    Dense(num_ts, activation="linear")
])

model_lstm.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 74: early stopping
Restoring model weights from the end of the best epoch: 24.
CPU times: user 3.13 s, sys: 14.9 ms, total: 3.14 s
Wall time: 3.2 s

Assess the fits

model_lstm.evaluate(X_val, y_val, verbose=0)
1.360884428024292
Code
Y_pred = model_lstm.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A GRU layer

random.seed(1)

model_gru = Sequential([
    Input((seq_length, num_ts)),
    GRU(50),
    Dense(num_ts, activation="linear")
])

model_gru.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 77: early stopping
Restoring model weights from the end of the best epoch: 27.
CPU times: user 3.77 s, sys: 19.3 ms, total: 3.79 s
Wall time: 3.85 s

Assess the fits

model_gru.evaluate(X_val, y_val, verbose=0)
1.3418978452682495
Code
Y_pred = model_gru.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Two GRU layers

random.seed(1)

model_two_grus = Sequential([
    Input((seq_length, num_ts)),
    GRU(50, return_sequences=True),
    GRU(50),
    Dense(num_ts, activation="linear")
])

model_two_grus.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 65: early stopping
Restoring model weights from the end of the best epoch: 15.
CPU times: user 5.39 s, sys: 20 ms, total: 5.41 s
Wall time: 5.46 s

Assess the fits

model_two_grus.evaluate(X_val, y_val, verbose=0)
1.378563404083252
Code
Y_pred = model_two_grus.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Compare the models

Code
models = [model_dense, model_simple, model_lstm, model_gru, model_two_grus]
model_names = ["Dense", "SimpleRNN", "LSTM", "GRU", "2 GRUs"]
mse_val = {
    name: model.evaluate(X_val, y_val, verbose=0)
    for name, model in zip(model_names, models)
}
val_results = pd.DataFrame({"Model": mse_val.keys(), "MSE": mse_val.values()})
val_results.sort_values("MSE", ascending=False)
Model MSE
0 Dense 1.546974
1 SimpleRNN 1.473482
4 2 GRUs 1.378563
2 LSTM 1.360884
3 GRU 1.341898

The network with an LSTM layer is the best.

model_lstm.evaluate(test_ds, verbose=0)
1.9254661798477173

Test set

Code
Y_pred = model_lstm.predict(test_ds, verbose=0)
plt.scatter(y_test, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_test[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_test[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_test[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.24.0

keras     : 3.3.3
matplotlib: 3.9.0
numpy     : 1.26.4
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.11.0
torch     : 2.3.1
tensorflow: 2.16.1
tf_keras  : 2.16.0

Glossary

  • autoregressive forecasting
  • forecasting
  • GRU
  • LSTM
  • one-step/multi-step ahead forecasting
  • persistence forecast
  • recurrent neural networks
  • SimpleRNN