Time Series & Recurrent Neural Networks

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Author

Patrick Laub

Show the package imports
import random
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

from keras.models import Sequential
from keras.layers import Dense, Input, Rescaling
from keras.callbacks import EarlyStopping

Time Series

Tabular data vs time series data

Tabular data

We have a dataset \{ \boldsymbol{x}_i, y_i \}_{i=1}^n which we assume are i.i.d. observations.

Brand Mileage # Claims
BMW 101 km 1
Audi 432 km 0
Volvo 3 km 5
\vdots \vdots \vdots

The goal is to predict the y for some covariates \boldsymbol{x}.

Time series data

Have a sequence \{ \boldsymbol{x}_t, y_t \}_{t=1}^T of observations taken at regular time intervals.

Date Humidity Temp.
Jan 1 60% 20 °C
Jan 2 65% 22 °C
Jan 3 70% 21 °C
\vdots \vdots \vdots

The task is to forecast future values based on the past.

Attributes of time series data

  • Temporal ordering: The order of the observations matters.
  • Trend: The general direction of the data.
  • Noise: Random fluctuations in the data.
  • Seasonality: Patterns that repeat at regular intervals.
Note

Question: What will be the temperature in Berlin tomorrow? What information would you use to make a prediction?

Australian financial stocks

# First, install yfinance if you haven't already:
# pip install yfinance
import yfinance as yf
import pandas as pd
from datetime import date

# 1. Define the ASX tickers (Yahoo uses “.AX” for Australian stocks
#    and “^AXJO” for the S&P/ASX 200 index)
tickers = {
    "ANZ":    "ANZ.AX",
    "BOQ":    "BOQ.AX",
    "CBA":    "CBA.AX",
    "NAB":    "NAB.AX",
    "QBE":    "QBE.AX",
    "SUN":    "SUN.AX",
    "WBC":    "WBC.AX",
    "ASX200": "^AXJO"
}

# 2. Choose your date range
start = "1999-01-01"
end   = date.today().isoformat()  # e.g. "2025-06-29"

# 3. Download the data
#    This returns a Panel-like DataFrame: columns are tickers, rows are trading dates.
raw_data = yf.download(
    tickers=list(tickers.values()),
    start=start,
    end=end,
    progress=False
)

raw_data.to_csv("asx_raw_data.csv")  # Optional: Save raw data for inspection

data = raw_data["Close"]

# 4. Rename columns to the simple names
data.rename(columns={v: k for k, v in tickers.items()}, inplace=True)

cols = ["ANZ", "ASX200", "BOQ", "CBA", "NAB", "QBE", "SUN", "WBC"]
data = data[cols]

# 5. (Optional) Inspect the first few rows
print(data.head())

# 6. Save to CSV
output_path = "asx_daily_close_prices.csv"
data.to_csv(output_path)
print(f"Saved daily close prices to {output_path}")

Australian financial stocks

stocks = pd.read_csv("aus_fin_stocks.csv")
stocks
Date ANZ ASX200 BOQ CBA NAB QBE SUN WBC
0 1999-01-01 8.413491 NaN NaN 14.560169 NaN NaN 7.965791 NaN
1 1999-01-04 8.476515 2732.199951 NaN 14.402927 12.995510 2.648447 7.965791 6.370629
2 1999-01-05 8.452881 2716.600098 NaN 14.409224 13.169948 2.564247 7.965791 6.485921
... ... ... ... ... ... ... ... ... ...
6742 2025-06-25 29.100000 8559.200195 7.86 191.399994 40.049999 23.480000 21.750000 34.540001
6743 2025-06-26 29.740000 8550.799805 7.86 190.710007 39.889999 23.330000 21.459999 34.570000
6744 2025-06-27 29.200001 8514.200195 7.77 185.360001 39.259998 23.219999 21.330000 33.900002

6745 rows × 9 columns

Plot

stocks.plot()

Data types and NA values

stocks.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    6745 non-null   object 
 1   ANZ     6745 non-null   float64
 2   ASX200  6691 non-null   float64
 3   BOQ     6599 non-null   float64
 4   CBA     6745 non-null   float64
 5   NAB     6744 non-null   float64
 6   QBE     6744 non-null   float64
 7   SUN     6745 non-null   float64
 8   WBC     6744 non-null   float64
dtypes: float64(8), object(1)
memory usage: 474.4+ KB
for col in stocks.columns:
    print(f"{col}: {stocks[col].isna().sum()}")
Date: 0
ANZ: 0
ASX200: 54
BOQ: 146
CBA: 0
NAB: 1
QBE: 1
SUN: 0
WBC: 1
asx200 = stocks.pop("ASX200")

Set the index to the date

stocks["Date"] = pd.to_datetime(stocks["Date"])
stocks = stocks.set_index("Date") # or `stocks.set_index("Date", inplace=True)`
stocks
ANZ BOQ CBA NAB QBE SUN WBC
Date
1999-01-01 8.413491 NaN 14.560169 NaN NaN 7.965791 NaN
1999-01-04 8.476515 NaN 14.402927 12.995510 2.648447 7.965791 6.370629
1999-01-05 8.452881 NaN 14.409224 13.169948 2.564247 7.965791 6.485921
... ... ... ... ... ... ... ...
2025-06-25 29.100000 7.86 191.399994 40.049999 23.480000 21.750000 34.540001
2025-06-26 29.740000 7.86 190.710007 39.889999 23.330000 21.459999 34.570000
2025-06-27 29.200001 7.77 185.360001 39.259998 23.219999 21.330000 33.900002

6745 rows × 7 columns

Plot II

stocks.plot()
plt.ylabel("Stock Price ($)")
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Can index using dates I

stocks.loc["2010-1-4":"2010-01-8"]
ANZ BOQ CBA NAB QBE SUN WBC
Date
2010-01-04 18.874071 8.412694 34.510429 14.440043 13.831507 10.478957 14.958415
2010-01-05 18.964771 8.520640 35.032455 14.624498 13.902833 10.636262 15.082577
2010-01-06 18.684425 8.477461 35.208561 14.339908 13.688861 10.406354 15.011630
2010-01-07 18.239164 8.218388 34.868923 14.223969 13.441965 10.551559 14.810603
2010-01-08 18.346357 8.419890 35.321777 14.176538 13.590102 10.612061 14.869730

Note, these ranges are inclusive, not like Python’s normal slicing.

Can index using dates II

So to get 2019’s December and all of 2020 for CBA:

stocks.loc["2019-12":"2020", ["CBA"]]
CBA
Date
2019-12-02 66.193298
2019-12-03 64.494362
2019-12-04 63.250660
... ...
2020-12-29 70.822792
2020-12-30 70.468712
2020-12-31 69.221031

275 rows × 1 columns

Can look at the first differences

stocks.diff().plot()
plt.ylabel("Daily Price Changes ($)")
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Can look at the percentage changes

stocks.pct_change().plot()
plt.ylabel("Daily Returns (%)")
plt.legend(loc="upper center", bbox_to_anchor=(0.5, -0.5), ncol=4);

Focus on one stock

stock = stocks[["CBA"]].copy()
stock
CBA
Date
1999-01-01 14.560169
1999-01-04 14.402927
1999-01-05 14.409224
... ...
2025-06-25 191.399994
2025-06-26 190.710007
2025-06-27 185.360001

6745 rows × 1 columns

stock.plot()
plt.ylabel("Stock Price ($)");

stock.isna().sum()
CBA    0
dtype: int64

Convert to log returns

Instead of working with raw prices, we’ll work with daily log returns:

# Calculate log returns
stock_log = np.log(stock / stock.shift(1)).dropna()
stock_log.head()
CBA
Date
1999-01-04 -0.010858
1999-01-05 0.000437
1999-01-06 0.008042
1999-01-07 0.025859
1999-01-08 -0.021323
Code
stock_log.plot()
plt.ylabel("Daily Log Return")
plt.title("CBA Daily Log Returns");

Code
# Distribution of log returns
stock_log.hist(bins=50, alpha=0.7)
plt.xlabel("Daily Log Return")
plt.ylabel("Frequency")
plt.title("Distribution of CBA Daily Log Returns");

Fill in the missing values

asx200 = pd.DataFrame(asx200).set_index(stocks.index)
missing_day = asx200.index[asx200["ASX200"].isna()][1]
prev_day = missing_day - pd.Timedelta(days=1)
after = missing_day + pd.Timedelta(days=3)
asx200.loc[prev_day:after]
ASX200
Date
1999-01-25 2713.100098
1999-01-26 NaN
1999-01-27 2738.800049
1999-01-28 2765.100098
1999-01-29 2781.699951
asx200 = asx200.ffill()
asx200.loc[prev_day:after]
ASX200
Date
1999-01-25 2713.100098
1999-01-26 2713.100098
1999-01-27 2738.800049
1999-01-28 2765.100098
1999-01-29 2781.699951

Baseline forecasts

Helper functions

def log_to_price(log_returns, initial_price):
    """Convert log returns to raw prices given an initial price."""
    # Use cumulative sum of log returns for numerical stability
    # P_t = P_0 * exp(sum of log returns from 1 to t)
    cumulative_log_returns = log_returns.cumsum()
    prices = initial_price * np.exp(cumulative_log_returns)
    
    return prices

def get_last_price(stock_df, cutoff_date):
    """Get the last known price before the forecast period starts."""
    last_known_date = stock_df.loc[:cutoff_date].index[-1]
    return stock_df.loc[last_known_date, "CBA"]

Persistence forecast

Predict the next value to be the same as the current value.

last_price = get_last_price(stock, cutoff_date="2024-12")
log_persistence = pd.Series(0.0, index=stock_log.loc["2025":].index)
persistence_prices = log_to_price(log_persistence, last_price)
Code
start = "2024-12"
end = "2025"
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.plot(label="Persistence")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend()

Extrapolate the trend

recent_log_returns = stock_log.loc["2022-01":"2024-12"]
trend_log = recent_log_returns.mean().values[0]
print(f"Average daily log return: {trend_log:.6f}")
Average daily log return: 0.000709
log_trend = pd.Series(trend_log, index=stock_log.loc["2025":].index)
trend_prices = log_to_price(log_trend, last_price)
Code
# Plot the trend line over the top of the 'recent_log_returns' data
recent_log_returns.plot()
plt.axhline(trend_log, color="red", linestyle="--", label="Trend (mean log return)")
plt.ylabel("Daily Log Return");

Trend fitted

# Create trend forecast for the recent period to show fitted trend
recent_trend_log = pd.Series(trend_log, index=recent_log_returns.index)
trend_start_price = get_last_price(stock, cutoff_date=recent_log_returns.index[0].strftime('%Y-%m-%d'))
recent_trend_prices = log_to_price(recent_trend_log, trend_start_price)
Code
stock.loc[recent_log_returns.index, ["CBA"]].plot(label="CBA")
recent_trend_prices.plot(label="Trend")
plt.axvline(recent_log_returns.index[0], color="gray", linestyle=":", linewidth=1)
plt.axvline(recent_log_returns.index[-1], color="gray", linestyle=":", linewidth=1)
plt.ylabel("Stock Price ($)")
plt.legend();

Trend forecasts

Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.loc[start:end].plot(label="Persistence")
trend_prices.loc[start:end].plot(label="Trend")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(ncol=3, loc="upper center", bbox_to_anchor=(0.5, 1.3));

Which is better?

If we look at the mean squared error (MSE) of the two models:

# Calculate MSE using the actual forecasts we computed
actual_prices = stock.loc["2025":, "CBA"]
persistence_mse = mean_squared_error(actual_prices, persistence_prices)
trend_mse = mean_squared_error(actual_prices, trend_prices)
persistence_mse, trend_mse
(254.04075100411256, 99.28717915684173)

Use the history

Now let’s work with log returns instead of raw prices to create lagged features:

cba_log_shifted = stock_log["CBA"].head().shift(1)
both = pd.concat([stock_log["CBA"].head(), cba_log_shifted], axis=1, keys=["Today", "Yesterday"])
both
Today Yesterday
Date
1999-01-04 -0.010858 NaN
1999-01-05 0.000437 -0.010858
1999-01-06 0.008042 0.000437
1999-01-07 0.025859 0.008042
1999-01-08 -0.021323 0.025859
def lagged_timeseries(df, target, window=40):
    lagged = pd.DataFrame()
    for i in range(window, 0, -1):
        lagged[f"T-{i}"] = df[target].shift(i)
    lagged["T"] = df[target].values
    return lagged

Lagged time series

df_lags = lagged_timeseries(stock_log, "CBA", 40)
df_lags
T-40 T-39 T-38 T-37 T-36 T-35 T-34 T-33 T-32 T-31 ... T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1 T
Date
1999-01-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.010858
1999-01-05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN -0.010858 0.000437
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2025-06-26 0.021968 0.003894 0.014307 -0.016222 -0.001139 -0.004689 -0.002715 0.009202 0.000838 -0.006240 ... -0.006558 0.000334 -0.001506 0.005789 0.014710 -0.001752 0.009922 0.020297 0.017232 -0.003611
2025-06-27 0.003894 0.014307 -0.016222 -0.001139 -0.004689 -0.002715 0.009202 0.000838 -0.006240 0.008153 ... 0.000334 -0.001506 0.005789 0.014710 -0.001752 0.009922 0.020297 0.017232 -0.003611 -0.028454

6744 rows × 41 columns

Split into training and testing

# Split the data in time
X_train = df_lags.loc[:"2021"]
X_val = df_lags.loc["2022-01":"2024-12"]  # 2022-2024
X_test = df_lags.loc["2025":]  # 2025

# Remove any with NAs and split into X and y
X_train = X_train.dropna()
X_val = X_val.dropna()
X_test = X_test.dropna()

y_train = X_train.pop("T")
y_val = X_val.pop("T")
y_test = X_test.pop("T")
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
((5825, 40), (5825,), (757, 40), (757,), (122, 40), (122,))

Inspect the split data

X_train
T-40 T-39 T-38 T-37 T-36 T-35 T-34 T-33 T-32 T-31 ... T-10 T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1
Date
1999-03-01 -0.010858 0.000437 0.008042 0.025859 -0.021323 -0.010573 -0.011214 -0.014823 -0.009704 0.018119 ... 0.007118 -0.026875 0.000121 -0.008127 0.000000 -0.009016 0.012275 0.010156 -0.014231 -0.011912
1999-03-02 0.000437 0.008042 0.025859 -0.021323 -0.010573 -0.011214 -0.014823 -0.009704 0.018119 0.012994 ... -0.026875 0.000121 -0.008127 0.000000 -0.009016 0.012275 0.010156 -0.014231 -0.011912 0.010277
1999-03-03 0.008042 0.025859 -0.021323 -0.010573 -0.011214 -0.014823 -0.009704 0.018119 0.012994 0.010881 ... 0.000121 -0.008127 0.000000 -0.009016 0.012275 0.010156 -0.014231 -0.011912 0.010277 0.004082
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-12-29 0.015263 -0.004999 0.011656 0.013921 0.011090 0.003821 -0.012058 0.000919 -0.016106 0.008825 ... 0.001327 -0.005011 -0.006273 -0.001445 0.023788 0.000807 0.003924 0.001906 0.003302 0.005181
2021-12-30 -0.004999 0.011656 0.013921 0.011090 0.003821 -0.012058 0.000919 -0.016106 0.008825 -0.001018 ... -0.005011 -0.006273 -0.001445 0.023788 0.000807 0.003924 0.001906 0.003302 0.005181 0.012541
2021-12-31 0.011656 0.013921 0.011090 0.003821 -0.012058 0.000919 -0.016106 0.008825 -0.001018 -0.003060 ... -0.006273 -0.001445 0.023788 0.000807 0.003924 0.001906 0.003302 0.005181 0.012541 0.003624

5825 rows × 40 columns

Plot the split

Code
y_train.plot()
y_val.plot()
y_test.plot()
plt.ylabel("Daily Log Return")
plt.legend(["Train", "Validation", "Test"], loc="center left", bbox_to_anchor=(1, 0.5));

Fit a linear model

lr = LinearRegression()
lr.fit(X_train, y_train);

Make a forecast for the test data:

y_pred = lr.predict(X_test)

Plot predictions in log return space

Code
log_forecasts_df = pd.DataFrame({
    "Actual": y_test,
    "Linear": y_pred
}, index=y_test.index)

log_forecasts_df.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Convert to raw prices and plot

linear_log_forecasts = pd.Series(y_pred, index=y_test.index)
prev_price = stock.shift(1).loc["2025", "CBA"]
linear_price_forecasts = prev_price * np.exp(linear_log_forecasts)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.reindex(stock.loc[start:end].index).plot(label="Persistence")
trend_prices.reindex(stock.loc[start:end].index).plot(label="Trend")
linear_price_forecasts.reindex(stock.loc[start:end].index).plot(label="Linear")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Distribution of log return predictions

Code
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3))

# Actual log returns
y_val.hist(bins=30, alpha=0.7, ax=ax1, label="Actual")
ax1.set_xlabel("Daily Log Return")
ax1.set_ylabel("Frequency")
ax1.set_title("Actual Log Returns")

# Predicted log returns
pd.Series(y_pred).hist(bins=30, alpha=0.7, ax=ax2, label="Predicted", color="orange")
ax2.set_xlabel("Daily Log Return")
ax2.set_ylabel("Frequency")
ax2.set_title("Predicted Log Returns")

plt.tight_layout()

# Calculate MSE on raw prices using explicit forecast calculation
test_actual_prices = stock.loc["2025":, "CBA"]
test_linear_prices = linear_price_forecasts.loc["2025":]
linear_mse = mean_squared_error(test_actual_prices, test_linear_prices)
linear_mse
5.136664038344952
persistence_mse, trend_mse, linear_mse
(254.04075100411256, 99.28717915684173, 5.136664038344952)

Multi-step forecasts

Comparing apples to apples

The linear model is only producing one-step-ahead forecasts.

The other models are producing multi-step-ahead forecasts.

shifted_forecast = stock["CBA"].shift(1).loc["2025":]
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
linear_price_forecasts.reindex(stock.loc[start:end].index).plot(label="Linear")
shifted_forecast.reindex(stock.loc[start:end].index).plot(label="Shifted")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

shifted_mse = mean_squared_error(stock.loc["2025":, "CBA"], shifted_forecast)
persistence_mse, trend_mse, linear_mse, shifted_mse
(254.04075100411256, 99.28717915684173, 5.136664038344952, 5.06190042634739)

Autoregressive forecasts

The linear model needs the last 90 days to make a forecast.

Idea: Make the first forecast, then use that to make the next forecast, and so on.

\begin{aligned} \hat{y}_t &= \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \ldots + \beta_n y_{t-n} \\ \hat{y}_{t+1} &= \beta_0 + \beta_1 \hat{y}_t + \beta_2 y_{t-1} + \ldots + \beta_n y_{t-n+1} \\ \hat{y}_{t+2} &= \beta_0 + \beta_1 \hat{y}_{t+1} + \beta_2 \hat{y}_t + \ldots + \beta_n y_{t-n+2} \end{aligned} \vdots \hat{y}_{t+k} = \beta_0 + \beta_1 \hat{y}_{t+k-1} + \beta_2 \hat{y}_{t+k-2} + \ldots + \beta_n \hat{y}_{t+k-n}

Autoregressive forecasting function

def autoregressive_forecast(model, X_val, suppress=False):
    """
    Generate a multi-step forecast using the given model.
    """
    multi_step = pd.Series(index=X_val.index, name="Multi Step")

    # Initialize the input data for forecasting
    input_data = X_val.iloc[0].values.reshape(1, -1)

    for i in range(len(multi_step)):
        # Ensure input_data has the correct feature names
        input_df = pd.DataFrame(input_data, columns=X_val.columns)
        if suppress:
            next_value = model.predict(input_df, verbose=0)
        else:
            next_value = model.predict(input_df) 

        multi_step.iloc[i] = next_value

        # Append that prediction to the input for the next forecast
        if i + 1 < len(multi_step):
            input_data = np.append(input_data[:, 1:], next_value).reshape(1, -1)

    return multi_step

Look at the autoregressive linear forecasts

lr_forecast = autoregressive_forecast(lr, X_test)
lr_multi_step_prices = log_to_price(lr_forecast, last_price)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.reindex(stock.loc[start:end].index).plot(label="Persistence")
trend_prices.reindex(stock.loc[start:end].index).plot(label="Trend")
lr_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS Linear")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Compare log return forecasts

Code
log_forecasts_multistep = pd.DataFrame({
    "Actual": y_test,
    "One-step Linear": y_pred,
    "Multi-step Linear": lr_forecast
}, index=y_test.index)

log_forecasts_multistep.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Metrics

One-step-ahead forecasts:

linear_mse, shifted_mse
(5.136664038344952, 5.06190042634739)

Multi-step-ahead forecasts:

multi_step_linear_mse = mean_squared_error(stock.loc["2025":, "CBA"], lr_multi_step_prices.loc["2025":])
persistence_mse, trend_mse, multi_step_linear_mse
(254.04075100411256, 99.28717915684173, 181.11397713761005)

Prefer only short windows

Code
stock.loc["2025-01":"2025-01", ["CBA"]].plot(label="CBA")
persistence_prices.loc["2025-01":"2025-01"].plot(label="Persistence")
trend_prices.loc["2025-01":"2025-01"].plot(label="Trend")
lr_multi_step_prices.loc["2025-01":"2025-01"].plot(label="MS Linear")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

“It’s tough to make predictions, especially about the future.”

Neural network forecasts

Simple feedforward neural network

model = Sequential([
        Rescaling(1/0.02),
        Dense(32, activation="leaky_relu"),
        Dense(1)])  # Linear activation for log returns
model.compile(optimizer="adam", loss="mean_absolute_error")
if Path("aus_fin_fnn_model.keras").exists():
    model = keras.models.load_model("aus_fin_fnn_model.keras")
else:
    es = EarlyStopping(patience=15, restore_best_weights=True)
    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=500,
        callbacks=[es], verbose=0)
    model.save("aus_fin_fnn_model.keras")

model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling (Rescaling)           │ (None, 40)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 32)             │         1,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            33 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 4,037 (15.77 KB)
 Trainable params: 1,345 (5.25 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 2,692 (10.52 KB)

Forecast and plot

y_pred = model.predict(X_test, verbose=0)
Code
# Show log return predictions
log_forecasts_nn = pd.DataFrame({
    "Actual": y_test,
    "Linear": lr.predict(X_test),
    "FNN": y_pred.flatten()
}, index=y_test.index)

log_forecasts_nn.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Plot forecasts in raw price space

fnn_log_forecasts = pd.Series(y_pred.flatten(), index=y_test.index)
fnn_price_forecasts = prev_price * np.exp(fnn_log_forecasts)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
linear_price_forecasts.reindex(stock.loc[start:end].index).plot(label="Linear")
fnn_price_forecasts.reindex(stock.loc[start:end].index).plot(label="FNN")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Distribution of log return predictions

Code
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3))

# Actual log returns
y_test.hist(bins=30, alpha=0.7, ax=ax1, label="Actual")
ax1.set_xlabel("Daily Log Return")
ax1.set_ylabel("Frequency")
ax1.set_title("Actual Log Returns")

# Predicted log returns
pd.Series(y_pred.flatten()).hist(bins=30, alpha=0.7, ax=ax2, label="Predicted", color="orange")
ax2.set_xlabel("Daily Log Return")
ax2.set_ylabel("Frequency")
ax2.set_title("Predicted Log Returns")

plt.tight_layout()

Autoregressive forecasts

fnn_forecast = autoregressive_forecast(model, X_test, True)
fnn_multi_step_prices = log_to_price(fnn_forecast, last_price)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.reindex(stock.loc[start:end].index).plot(label="Persistence")
trend_prices.reindex(stock.loc[start:end].index).plot(label="Trend")
lr_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS Linear")
fnn_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS FNN")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Compare multi-step log return forecasts

Code
log_forecasts_multistep_nn = pd.DataFrame({
    "Actual": y_test,
    "Multi-step Linear": lr_forecast,
    "Multi-step FNN": fnn_forecast
}, index=y_test.index)

log_forecasts_multistep_nn.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts (MSE on raw prices):

nn_mse = mean_squared_error(stock.loc["2025":, "CBA"], fnn_price_forecasts.loc["2025":])
linear_mse, nn_mse
(5.136664038344952, 5.499976874450013)

Multi-step-ahead forecasts (MSE on raw prices):

multi_step_fnn_mse = mean_squared_error(stock.loc["2025":, "CBA"], fnn_multi_step_prices.loc["2025":])
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse
(254.04075100411256, 99.28717915684173, 181.11397713761005, 216.57446624903855)

Recurrent Neural Networks

Basic facts of RNNs

  • A recurrent neural network is a type of neural network that is designed to process sequences of data (e.g. time series, sentences).
  • A recurrent neural network is any network that contains a recurrent layer.
  • A recurrent layer is a layer that processes data in a sequence.
  • An RNN can have one or more recurrent layers.
  • Weights are shared over time; this allows the model to be used on arbitrary-length sequences.

Applications

  • Forecasting: revenue forecast, weather forecast, predict disease rate from medical history, etc.
  • Classification: given a time series of the activities of a visitor on a website, classify whether the visitor is a bot or a human.
  • Event detection: given a continuous data stream, identify the occurrence of a specific event. Example: Detect utterances like “Hey Alexa” from an audio stream.
  • Anomaly detection: given a continuous data stream, detect anything unusual happening. Example: Detect unusual activity on the corporate network.

Origin of the name of RNNs

A recurrence relation is an equation that expresses each element of a sequence as a function of the preceding ones. More precisely, in the case where only the immediately preceding element is involved, a recurrence relation has the form

u_n = \psi(n, u_{n-1}) \quad \text{ for } \quad n > 0.

Example: Factorial n! = n (n-1)! for n > 0 given 0! = 1.

Diagram of an RNN cell

The RNN processes each data in the sequence one by one, while keeping memory of what came before.

The following figure shows how the recurrent neural network combines an input X_l with a preprocessed state of the process A_l to produce the output O_l. RNNs have a cyclic information processing structure that enables them to pass information sequentially from previous inputs. RNNs can capture dependencies and patterns in sequential data, making them useful for analysing time series data.

Schematic of a recurrent neural network. E.g. SimpleRNN, LSTM, or GRU.

A SimpleRNN cell

Diagram of a SimpleRNN cell.

All the outputs before the final one are often discarded.

LSTM internals

Simple RNN structures encounter vanishing gradient problems, hence, struggle with learning long term dependencies. LSTM are designed to overcome this problem. LSTMs have a more complex network structure (contains more memory cells and gating mechanisms) and can better regulate the information flow.

Diagram of an LSTM cell. Notation for the diagram.

GRU internals

GRUs are simpler compared to LSTM, hence, computationally more efficient than LSTMs.

Diagram of a GRU cell.

Stock prediction with recurrent networks

SimpleRNN

from keras.layers import SimpleRNN, Reshape
model = Sequential([
        Rescaling(1/0.02),
        Reshape((-1, 1)),
        SimpleRNN(64, activation="tanh"),
        Dense(1)])  # Linear activation for log returns
model.compile(optimizer="adam", loss="mean_absolute_error")
es = EarlyStopping(patience=15, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_val, y_val),
    epochs=500, callbacks=[es], verbose=0)
model.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling_1 (Rescaling)         │ (None, 40)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape (Reshape)               │ (None, 40, 1)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ simple_rnn (SimpleRNN)          │ (None, 64)             │         4,224 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            65 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 12,869 (50.27 KB)
 Trainable params: 4,289 (16.75 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 8,580 (33.52 KB)

Forecast and plot

y_pred = model.predict(X_test.to_numpy(), verbose=0)
Code
# Show log return predictions
log_forecasts_rnn = pd.DataFrame({
    "Actual": y_test,
    "FNN": fnn_forecast.values[:len(y_test)],  # Use the FNN forecast from previous section
    "SimpleRNN": y_pred.flatten()
}, index=y_test.index)

log_forecasts_rnn.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Plot forecasts in raw price space

rnn_log_forecasts = pd.Series(y_pred.flatten(), index=y_test.index)
rnn_price_forecasts = prev_price * np.exp(rnn_log_forecasts)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
linear_price_forecasts.reindex(stock.loc[start:end].index).plot(label="Linear")
fnn_price_forecasts.reindex(stock.loc[start:end].index).plot(label="FNN")
rnn_price_forecasts.reindex(stock.loc[start:end].index).plot(label="SimpleRNN")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Multi-step forecasts

rnn_forecast = autoregressive_forecast(model, X_test, True)
rnn_multi_step_prices = log_to_price(rnn_forecast, last_price)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.reindex(stock.loc[start:end].index).plot(label="Persistence")
trend_prices.reindex(stock.loc[start:end].index).plot(label="Trend")
lr_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS Linear")
fnn_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS FNN")
rnn_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS RNN")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Metrics

One-step-ahead forecasts (MSE on raw prices):

rnn_mse = mean_squared_error(stock.loc["2025":, "CBA"], rnn_price_forecasts.loc["2025":])
linear_mse, nn_mse, rnn_mse
(5.136664038344952, 5.499976874450013, 4.9401398570164075)

Multi-step-ahead forecasts (MSE on raw prices):

multi_step_rnn_mse = mean_squared_error(stock.loc["2025":, "CBA"], rnn_multi_step_prices.loc["2025":])
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse, multi_step_rnn_mse
(254.04075100411256,
 99.28717915684173,
 181.11397713761005,
 216.57446624903855,
 112.51089934559508)

GRU

from keras.layers import GRU

model = Sequential([
        Rescaling(1/0.02),
        Reshape((-1, 1)),
        GRU(16, activation="tanh"),
        Dense(1)])  # Linear activation for log returns
model.compile(optimizer="adam", loss="mean_absolute_error")
es = EarlyStopping(patience=15, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_val, y_val),
    epochs=500, callbacks=[es], verbose=0)
model.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling_2 (Rescaling)         │ (None, 40)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape_1 (Reshape)             │ (None, 40, 1)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ gru (GRU)                       │ (None, 16)             │           912 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 2,789 (10.90 KB)
 Trainable params: 929 (3.63 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 1,860 (7.27 KB)

Forecast and plot

y_pred = model.predict(X_test, verbose=0)
Code
# Show log return predictions
log_forecasts_gru = pd.DataFrame({
    "Actual": y_test,
    "SimpleRNN": rnn_forecast.values[:len(y_test)],  # Use the RNN forecast from previous section
    "GRU": y_pred.flatten()
}, index=y_test.index)

log_forecasts_gru.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Plot forecasts in raw price space

gru_log_forecasts = pd.Series(y_pred.flatten(), index=y_test.index)
gru_price_forecasts = prev_price * np.exp(gru_log_forecasts)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
linear_price_forecasts.reindex(stock.loc[start:end].index).plot(label="Linear")
fnn_price_forecasts.reindex(stock.loc[start:end].index).plot(label="FNN")
rnn_price_forecasts.reindex(stock.loc[start:end].index).plot(label="SimpleRNN")
gru_price_forecasts.reindex(stock.loc[start:end].index).plot(label="GRU")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

Multi-step forecasts

gru_forecast = autoregressive_forecast(model, X_test, True)
gru_multi_step_prices = log_to_price(gru_forecast, last_price)
Code
stock.loc[start:end, ["CBA"]].plot(label="CBA")
persistence_prices.reindex(stock.loc[start:end].index).plot(label="Persistence")
trend_prices.reindex(stock.loc[start:end].index).plot(label="Trend")
lr_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS Linear")
fnn_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS FNN")
rnn_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS RNN")
gru_multi_step_prices.reindex(stock.loc[start:end].index).plot(label="MS GRU")
plt.axvline("2025-01", color="black", linestyle="--")
plt.ylabel("Stock Price ($)")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5));

All multi-step log return forecasts

Code
log_forecasts_all = pd.DataFrame({
    "Actual": y_test,
    "Linear": lr_forecast,
    "FNN": fnn_forecast,
    "SimpleRNN": rnn_forecast,
    "GRU": gru_forecast
}, index=y_test.index)

log_forecasts_all.plot()
plt.ylabel("Daily Log Return")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))

Metrics

One-step-ahead forecasts (MSE on raw prices):

gru_mse = mean_squared_error(stock.loc["2025":, "CBA"], gru_price_forecasts.loc["2025":])
linear_mse, nn_mse, rnn_mse, gru_mse
(5.136664038344952, 5.499976874450013, 4.9401398570164075, 5.187925181803358)

Multi-step-ahead forecasts (MSE on raw prices):

multi_step_gru_mse = mean_squared_error(stock.loc["2025":, "CBA"], gru_multi_step_prices.loc["2025":])
persistence_mse, trend_mse, multi_step_linear_mse, multi_step_fnn_mse, multi_step_rnn_mse, multi_step_gru_mse
(254.04075100411256,
 99.28717915684173,
 181.11397713761005,
 216.57446624903855,
 112.51089934559508,
 97.04131912129455)

Summary of all model performance

One-step forecasts

MSE
Shifted 5.061900
Linear 5.136664
FNN 5.499977
SimpleRNN 4.940140
GRU 5.187925

Multi-step forecasts

MSE
Persistence 254.040751
Trend 99.287179
Linear 181.113977
FNN 216.574466
SimpleRNN 112.510899
GRU 97.041319

Internals of the SimpleRNN

The rank of a time series

Say we had n observations of a time series x_1, x_2, \dots, x_n.

This \boldsymbol{x} = (x_1, \dots, x_n) would have shape (n,) & rank 1.

If instead we had a batch of b time series’

\boldsymbol{X} = \begin{pmatrix} x_7 & x_8 & \dots & x_{7+n-1} \\ x_2 & x_3 & \dots & x_{2+n-1} \\ \vdots & \vdots & \ddots & \vdots \\ x_3 & x_4 & \dots & x_{3+n-1} \\ \end{pmatrix} \,,

the batch \boldsymbol{X} would have shape (b, n) & rank 2.

Multivariate time series

Multivariate time series consists of more than 1 variable observation at a given time point. Following example has two variables x and y.

t x y
0 x_0 y_0
1 x_1 y_1
2 x_2 y_2
3 x_3 y_3

Say n observations of the m time series, would be a shape (n, m) matrix of rank 2.

In Keras, a batch of b of these time series has shape (b, n, m) and has rank 3.

Note

Use \boldsymbol{x}_t \in \mathbb{R}^{1 \times m} to denote the vector of all time series at time t. Here, \boldsymbol{x}_t = (x_t, y_t).

SimpleRNN

Say each prediction is a vector of size d, so \boldsymbol{y}_t \in \mathbb{R}^{1 \times d}.

Then the main equation of a SimpleRNN, given \boldsymbol{y}_0 = \boldsymbol{0}, is

\boldsymbol{y}_t = \psi\bigl( \boldsymbol{x}_t \boldsymbol{W}_x + \boldsymbol{y}_{t-1} \boldsymbol{W}_y + \boldsymbol{b} \bigr) .

Here, \begin{aligned} &\boldsymbol{x}_t \in \mathbb{R}^{1 \times m}, \boldsymbol{W}_x \in \mathbb{R}^{m \times d}, \\ &\boldsymbol{y}_{t-1} \in \mathbb{R}^{1 \times d}, \boldsymbol{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \boldsymbol{b} \in \mathbb{R}^{d}. \end{aligned}

At each time step, a simple Recurrent Neural Network (RNN) takes an input vector x_t, incorporate it with the information from the previous hidden state {y}_{t-1} and produces an output vector at each time step y_t. The hidden state helps the network remember the context of the previous words, enabling it to make informed predictions about what comes next in the sequence. In a simple RNN, the output at time (t-1) is the same as the hidden state at time t.

SimpleRNN (in batches)

The difference between RNN and RNNs with batch processing lies in the way how the neural network handles sequences of input data. With batch processing, the model processes multiple (b) input sequences simultaneously. The training data is grouped into batches, and the weights are updated based on the average error across the entire batch. Batch processing often results in more stable weight updates, as the model learns from a diverse set of examples in each batch, reducing the impact of noise in individual sequences.

Say we operate on batches of size b, then \boldsymbol{Y}_t \in \mathbb{R}^{b \times d}.

The main equation of a SimpleRNN, given \boldsymbol{Y}_0 = \boldsymbol{0}, is \boldsymbol{Y}_t = \psi\bigl( \boldsymbol{X}_t \boldsymbol{W}_x + \boldsymbol{Y}_{t-1} \boldsymbol{W}_y + \boldsymbol{b} \bigr) . Here, \begin{aligned} &\boldsymbol{X}_t \in \mathbb{R}^{b \times m}, \boldsymbol{W}_x \in \mathbb{R}^{m \times d}, \\ &\boldsymbol{Y}_{t-1} \in \mathbb{R}^{b \times d}, \boldsymbol{W}_y \in \mathbb{R}^{d \times d}, \text{ and } \boldsymbol{b} \in \mathbb{R}^{d}. \end{aligned}

Simple Keras demo

1num_obs = 4
2num_time_steps = 3
3num_time_series = 2

X = (
    np.arange(num_obs * num_time_steps * num_time_series)
    .astype(np.float32)
    .reshape([num_obs, num_time_steps, num_time_series])
4)

output_size = 1
y = np.array([0, 0, 1, 1])
1
Defines the number of observations
2
Defines the number of time steps
3
Defines the number of time series
4
Reshapes the array to a range 3 tensor (4,3,2)
1X[:2]
1
Selects the first two slices along the first dimension. Since the tensor of dimensions (4,3,2), X[:2] selects the first two slices (0 and 1) along the first dimension, and returns a sub-tensor of shape (2,3,2).
array([[[ 0.,  1.],
        [ 2.,  3.],
        [ 4.,  5.]],

       [[ 6.,  7.],
        [ 8.,  9.],
        [10., 11.]]], dtype=float32)
1X[2:]
1
Selects the last two slices along the first dimension. The first dimension (axis=0) has size 4. Therefore, X[2:] selects the last two slices (2 and 3) along the first dimension, and returns a sub-tensor of shape (2,3,2).
array([[[12., 13.],
        [14., 15.],
        [16., 17.]],

       [[18., 19.],
        [20., 21.],
        [22., 23.]]], dtype=float32)

Keras’ SimpleRNN

As usual, the SimpleRNN is just a layer in Keras.

1from keras.layers import SimpleRNN

2random.seed(1234)
3model = Sequential([SimpleRNN(output_size, activation="sigmoid")])
4model.compile(loss="binary_crossentropy", metrics=["accuracy"])

5hist = model.fit(X, y, epochs=500, verbose=False)
6model.evaluate(X, y, verbose=False)
1
Imports the SimpleRNN layer from the Keras library
2
Sets the seed for the random number generator to ensure reproducibility
3
Defines a simple RNN with one output node and sigmoid activation function
4
Specifies binary crossentropy as the loss function (usually used in classification problems), and specifies “accuracy” as the metric to be monitored during training
5
Trains the model for 500 epochs and saves output as hist
6
Evaluates the model to obtain a value for the loss and accuracy
[3.1487133502960205, 0.5]

The predicted probabilities on the training set are:

model.predict(X, verbose=0)
array([[0.97],
       [1.  ],
       [1.  ],
       [1.  ]], dtype=float32)

SimpleRNN weights

To verify the results of predicted probabilities, we can obtain the weights of the fitted model and calculate the outcome manually as follows.

model.get_weights()
[array([[0.68],
        [0.2 ]], dtype=float32),
 array([[0.49]], dtype=float32),
 array([-0.51], dtype=float32)]
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


W_x, W_y, b = model.get_weights()

Y = np.zeros((num_obs, output_size), dtype=np.float32)
for t in range(num_time_steps):
    X_t = X[:, t, :]
    z = X_t @ W_x + Y @ W_y + b
    Y = sigmoid(z)

Y
array([[0.97],
       [1.  ],
       [1.  ],
       [1.  ]], dtype=float32)

Other recurrent network variants

Input and output sequences

Categories of recurrent neural networks: sequence to sequence, sequence to vector, vector to sequence, encoder-decoder network.

Input and output sequences

  • Sequence to sequence: Useful for predicting time series such as using prices over the last N days to output the prices shifted one day into the future (i.e. from N-1 days ago to tomorrow.)
  • Sequence to vector: ignore all outputs in the previous time steps except for the last one. Example: give a sentiment score to a sequence of words corresponding to a movie review.

Input and output sequences

  • Vector to sequence: feed the network the same input vector over and over at each time step and let it output a sequence. Example: given that the input is an image, find a caption for it. The image is treated as an input vector (pixels in an image do not follow a sequence). The caption is a sequence of textual description of the image. A dataset containing images and their descriptions is the input of the RNN.
  • The Encoder-Decoder: The encoder is a sequence-to-vector network. The decoder is a vector-to-sequence network. Example: Feed the network a sequence in one language. Use the encoder to convert the sentence into a single vector representation. The decoder decodes this vector into the translation of the sentence in another language.

Recurrent layers can be stacked.

Deep RNN unrolled through time.

CoreLogic Hedonic Home Value Index

Australian House Price Indices

Note

I apologise in advance for not being able to share this dataset with anyone (it is not mine to share).

Percentage changes

changes = house_prices.pct_change().dropna()
changes.round(2)
Brisbane East_Bris North_Bris West_Bris Melbourne North_Syd Sydney
Date
1990-02-28 0.03 -0.01 0.01 0.01 0.00 -0.00 -0.02
1990-03-31 0.01 0.03 0.01 0.01 0.02 -0.00 0.03
1990-04-30 0.02 0.02 0.01 -0.00 0.01 0.03 0.04
... ... ... ... ... ... ... ...
2021-03-31 0.04 0.04 0.03 0.04 0.02 0.05 0.05
2021-04-30 0.03 0.01 0.01 -0.00 0.01 0.02 0.02
2021-05-31 0.03 0.03 0.03 0.03 0.03 0.02 0.04

376 rows × 7 columns

Percentage changes

changes.plot();

The size of the changes

changes.mean()
Brisbane      0.005496
East_Bris     0.005416
North_Bris    0.005024
West_Bris     0.004842
Melbourne     0.005677
North_Syd     0.004819
Sydney        0.005526
dtype: float64
changes *= 100
changes.mean()
Brisbane      0.549605
East_Bris     0.541562
North_Bris    0.502390
West_Bris     0.484204
Melbourne     0.567700
North_Syd     0.481863
Sydney        0.552641
dtype: float64
changes.plot(legend=False);

Split without shuffling

num_train = int(0.6 * len(changes))
num_val = int(0.2 * len(changes))
num_test = len(changes) - num_train - num_val
print(f"# Train: {num_train}, # Val: {num_val}, # Test: {num_test}")
# Train: 225, # Val: 75, # Test: 76

Subsequences of a time series

Keras has a built-in method for converting a time series into subsequences/chunks.

from keras.utils import timeseries_dataset_from_array

integers = range(10)
dummy_dataset = timeseries_dataset_from_array(
    data=integers[:-3],
    targets=integers[3:],
    sequence_length=3,
    batch_size=2,
)

for inputs, targets in dummy_dataset:
    for i in range(inputs.shape[0]):
        print([int(x) for x in inputs[i]], int(targets[i]))
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7

Predicting Sydney House Prices

Creating dataset objects

# Num. of input time series.
num_ts = changes.shape[1]

# How many prev. months to use.
seq_length = 6

# Predict the next month ahead.
ahead = 1

# The index of the first target.
delay = seq_length + ahead - 1
# Which suburb to predict.
target_suburb = changes["Sydney"]

train_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    end_index=num_train,
)
val_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    start_index=num_train,
    end_index=num_train + num_val,
)
test_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=target_suburb[delay:],
    sequence_length=seq_length,
    start_index=num_train + num_val,
)

Converting Dataset to numpy

The Dataset object can be handed to Keras directly, but if we really need a numpy array, we can run:

X_train = np.concatenate(list(train_ds.map(lambda x, y: x)))
y_train = np.concatenate(list(train_ds.map(lambda x, y: y)))
2025-06-30 15:46:35.638164: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

The shape of our training set is now:

X_train.shape
(220, 6, 7)
y_train.shape
(220,)

Converting the rest to numpy arrays:

X_val = np.concatenate(list(val_ds.map(lambda x, y: x)))
y_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: x)))
y_test = np.concatenate(list(test_ds.map(lambda x, y: y)))
2025-06-30 15:46:35.726987: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

A dense network

from keras.layers import Input, Flatten
random.seed(1)
model_dense = Sequential([
    Input((seq_length, num_ts)),
    Flatten(),
    Dense(50, activation="leaky_relu"),
    Dense(20, activation="leaky_relu"),
    Dense(1, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3191 parameters.
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 2.71 s, sys: 293 ms, total: 3 s
Wall time: 2.68 s

Plot the model

from keras.utils import plot_model

plot_model(model_dense, show_shapes=True)

Assess the fits

model_dense.evaluate(X_val, y_val, verbose=0)
1.1644607782363892
Code
y_pred = model_dense.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A SimpleRNN layer

random.seed(1)

model_simple = Sequential([
    Input((seq_length, num_ts)),
    SimpleRNN(50),
    Dense(1, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 2951 parameters.
Epoch 62: early stopping
Restoring model weights from the end of the best epoch: 12.
CPU times: user 3.59 s, sys: 1.45 s, total: 5.04 s
Wall time: 3.31 s

Plot the model

plot_model(model_simple, show_shapes=True)

Assess the fits

model_simple.evaluate(X_val, y_val, verbose=0)
1.2507916688919067
Code
y_pred = model_simple.predict(X_val, verbose=0)

plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);
WARNING:tensorflow:5 out of the last 128 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x173dc7c40> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:6 out of the last 130 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x173dc7c40> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

A LSTM layer

from keras.layers import LSTM

random.seed(1)

model_lstm = Sequential([
    Input((seq_length, num_ts)),
    LSTM(50),
    Dense(1, activation="linear")
])

model_lstm.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 59: early stopping
Restoring model weights from the end of the best epoch: 9.
CPU times: user 4.35 s, sys: 958 ms, total: 5.3 s
Wall time: 3.6 s

Assess the fits

model_lstm.evaluate(X_val, y_val, verbose=0)
0.8353261947631836
Code
y_pred = model_lstm.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A GRU layer

from keras.layers import GRU

random.seed(1)

model_gru = Sequential([
    Input((seq_length, num_ts)),
    GRU(50),
    Dense(1, activation="linear")
])

model_gru.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 57: early stopping
Restoring model weights from the end of the best epoch: 7.
CPU times: user 3.8 s, sys: 377 ms, total: 4.17 s
Wall time: 3.54 s

Assess the fits

model_gru.evaluate(X_val, y_val, verbose=0)
0.7435100078582764
Code
y_pred = model_gru.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Two GRU layers

random.seed(1)

model_two_grus = Sequential([
    Input((seq_length, num_ts)),
    GRU(50, return_sequences=True),
    GRU(50),
    Dense(1, activation="linear")
])

model_two_grus.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 56: early stopping
Restoring model weights from the end of the best epoch: 6.
CPU times: user 5.24 s, sys: 487 ms, total: 5.72 s
Wall time: 4.64 s

Assess the fits

model_two_grus.evaluate(X_val, y_val, verbose=0)
0.7989509105682373
Code
y_pred = model_two_grus.predict(X_val, verbose=0)
plt.plot(y_val, label="Sydney")
plt.plot(y_pred, label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Compare the models

Model MSE
1 SimpleRNN 1.250792
0 Dense 1.164461
2 LSTM 0.835326
4 2 GRUs 0.798951
3 GRU 0.743510

The network with two GRU layers is the best.

model_two_grus.evaluate(X_test, y_test, verbose=0)
1.855254888534546

Test set

Code
y_pred = model_two_grus.predict(X_test, verbose=0)
plt.plot(y_test, label="Sydney")
plt.plot(y_pred, label="2 GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Predicting Multiple Time Series

Creating dataset objects

Change the targets argument to include all the suburbs.

train_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    end_index=num_train,
)
val_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    start_index=num_train,
    end_index=num_train + num_val,
)
test_ds = timeseries_dataset_from_array(
    changes[:-delay],
    targets=changes[delay:],
    sequence_length=seq_length,
    start_index=num_train + num_val,
)

Converting Dataset to numpy

The shape of our training set is now:

X_train = np.concatenate(list(train_ds.map(lambda x, y: x)))
X_train.shape
2025-06-30 15:46:56.533938: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
(220, 6, 7)
y_train = np.concatenate(list(train_ds.map(lambda x, y: y)))
y_train.shape
(220, 7)

Converting the rest to numpy arrays:

X_val = np.concatenate(list(val_ds.map(lambda x, y: x)))
y_val = np.concatenate(list(val_ds.map(lambda x, y: y)))
X_test = np.concatenate(list(test_ds.map(lambda x, y: x)))
y_test = np.concatenate(list(test_ds.map(lambda x, y: y)))

A dense network

random.seed(1)
model_dense = Sequential([
    Input((seq_length, num_ts)),
    Flatten(),
    Dense(50, activation="leaky_relu"),
    Dense(20, activation="leaky_relu"),
    Dense(num_ts, activation="linear")
])
model_dense.compile(loss="mse", optimizer="adam")
print(f"This model has {model_dense.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_dense.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3317 parameters.
Epoch 75: early stopping
Restoring model weights from the end of the best epoch: 25.
CPU times: user 3.53 s, sys: 387 ms, total: 3.91 s
Wall time: 3.5 s

Plot the model

plot_model(model_dense, show_shapes=True)

Assess the fits

model_dense.evaluate(X_val, y_val, verbose=0)
1.4294650554656982
Code
Y_pred = model_dense.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="Dense")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A SimpleRNN layer

random.seed(1)

model_simple = Sequential([
    Input((seq_length, num_ts)),
    SimpleRNN(50),
    Dense(num_ts, activation="linear")
])
model_simple.compile(loss="mse", optimizer="adam")
print(f"This model has {model_simple.count_params()} parameters.")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)
%time hist = model_simple.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
This model has 3257 parameters.
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 4.17 s, sys: 409 ms, total: 4.58 s
Wall time: 3.7 s

Plot the model

plot_model(model_simple, show_shapes=True)

Assess the fits

model_simple.evaluate(X_val, y_val, verbose=0)
1.4916820526123047
Code
Y_pred = model_simple.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="SimpleRNN")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A LSTM layer

random.seed(1)

model_lstm = Sequential([
    Input((seq_length, num_ts)),
    LSTM(50),
    Dense(num_ts, activation="linear")
])

model_lstm.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_lstm.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0);
Epoch 74: early stopping
Restoring model weights from the end of the best epoch: 24.
CPU times: user 4.91 s, sys: 455 ms, total: 5.37 s
Wall time: 4.39 s

Assess the fits

model_lstm.evaluate(X_val, y_val, verbose=0)
1.3311247825622559
Code
Y_pred = model_lstm.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="LSTM")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

A GRU layer

random.seed(1)

model_gru = Sequential([
    Input((seq_length, num_ts)),
    GRU(50),
    Dense(num_ts, activation="linear")
])

model_gru.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_gru.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 70: early stopping
Restoring model weights from the end of the best epoch: 20.
CPU times: user 4.69 s, sys: 435 ms, total: 5.12 s
Wall time: 4.09 s

Assess the fits

model_gru.evaluate(X_val, y_val, verbose=0)
1.344503402709961
Code
Y_pred = model_gru.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Two GRU layers

random.seed(1)

model_two_grus = Sequential([
    Input((seq_length, num_ts)),
    GRU(50, return_sequences=True),
    GRU(50),
    Dense(num_ts, activation="linear")
])

model_two_grus.compile(loss="mse", optimizer="adam")

es = EarlyStopping(patience=50, restore_best_weights=True, verbose=1)

%time hist = model_two_grus.fit(X_train, y_train, epochs=1_000, \
  validation_data=(X_val, y_val), callbacks=[es], verbose=0)
Epoch 67: early stopping
Restoring model weights from the end of the best epoch: 17.
CPU times: user 5.81 s, sys: 547 ms, total: 6.36 s
Wall time: 5.22 s

Assess the fits

model_two_grus.evaluate(X_val, y_val, verbose=0)
1.358651041984558
Code
Y_pred = model_two_grus.predict(X_val, verbose=0)
plt.scatter(y_val, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_val[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_val[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_val[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="2 GRUs")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Compare the models

Code
models = [model_dense, model_simple, model_lstm, model_gru, model_two_grus]
model_names = ["Dense", "SimpleRNN", "LSTM", "GRU", "2 GRUs"]
mse_val = {
    name: model.evaluate(X_val, y_val, verbose=0)
    for name, model in zip(model_names, models)
}
val_results = pd.DataFrame({"Model": mse_val.keys(), "MSE": mse_val.values()})
val_results.sort_values("MSE", ascending=False)
Model MSE
1 SimpleRNN 1.491682
0 Dense 1.429465
4 2 GRUs 1.358651
3 GRU 1.344503
2 LSTM 1.331125

The network with an LSTM layer is the best.

model_lstm.evaluate(X_test, y_test, verbose=0)
1.932026982307434

Test set

Code
Y_pred = model_lstm.predict(X_test, verbose=0)
plt.scatter(y_test, Y_pred)
add_diagonal_line()
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()

plt.plot(y_test[:, 4], label="Melbourne")
plt.plot(Y_pred[:, 4], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Code
plt.plot(y_test[:, 0], label="Brisbane")
plt.plot(Y_pred[:, 0], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False)
plt.show()

plt.plot(y_test[:, 6], label="Sydney")
plt.plot(Y_pred[:, 6], label="GRU")
plt.xlabel("Time")
plt.ylabel("Change in HPI (%)")
plt.legend(frameon=False);

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version       : 3.11.12
IPython version      : 9.3.0

keras     : 3.8.0
matplotlib: 3.10.0
numpy     : 1.26.4
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.13.1
torch     : 2.6.0
tensorflow: 2.18.0
tf_keras  : 2.18.0

Glossary

  • autoregressive forecasting
  • forecasting
  • GRU
  • LSTM
  • one-step/multi-step ahead forecasting
  • persistence forecast
  • recurrent neural networks
  • SimpleRNN