
ACTL3143 & ACTL5111 Deep Learning for Actuaries
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
Artificial intelligence describes an agent which is capable of:
| Thinking humanly | Thinking rationally |
| Acting humanly | Acting rationally |
Put another way, what fields are most important to AI?
| Cognitive science, psychology | Mathematics, logic and inference |
| HCI, linguistics and robotics | Computer science, statistics |
“For these reasons, the rational-agent approach to AI has prevailed throughout most of the field’s history. In the early decades, rational agents were built on logical foundations and formed definite plans to achieve specific goals. Later, methods based on probability theory and machine learning allowed the creation of agents that could make decisions under I uncertainty to attain the best expected outcome. In a nutshell, AI has focused on the study and construction of agents that do the right thing.”
— Russell & Norvig (2021, p. 22)

Question: When do you think the term “machine learning” was first coined?


Source: Wikipedia page for the Shakey Project and for the A* search algorithm.


Source: codeRtime, Programming a simple minimax chess engine in R, and Sebastian Lague (2018), Algorithms Explained – minimax and alpha-beta pruning.
Deep Blue (1997)


Sources: Mark Robert Anderson (2017), Twenty years on from Deep Blue vs Kasparov, The Conversation article, and Computer History Museum.
Tried making a computer smart, too hard!
Make a computer that can learn to be smart.
“[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed”
— Samuel (1959)
AI eventually become dominated by one approach, called machine learning, which itself is now dominated by deep learning (neural networks).
You can study a 12 week course on AI and never touch on machine learning…

Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
What is this? 
Options:
Source: Wikipedia
What is this?

Options:
Source: Wikipedia
What is this?

Options:
Source: Wikipedia
ImageNet and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC); originally 1,000 synsets.
AlexNet — a neural network developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton — won the ILSVRC 2012 challenge convincingly.
Source: James Briggs & Laura Carnevali, AlexNet and ImageNet: The Birth of Deep Learning, Embedding Methods for Image Search, Pinecone Blog
A graphics processing unit (GPU)

“4.2. Training on multiple GPUs A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs.”
— Krizhevsky et al. (2012)
Deep Blue was a win for AI, AlphaGo a win for DL.
Lee Sedol playing AlphaGo AI
I highly recommend this documentary about the event.
Source: Patrick House (2016), AlphaGo, Lee Sedol, and the Reassuring Future of Humans and Machines, New Yorker article.
https://thispersondoesnotexist.com/


Source: Goodfellow et al. (2014).


Source: Ho et al. (2020). Images generated with Dall-E 2, prompts by ACTL3143 students in 2022.

Test ChatGPT’s ability to:
Compare to Copilot.
You can get extra usage for free with GitHub Education for Students
Source: GitHub Blog
Reasoning is basically automating the old trick of adding “do this step-by-step” to your prompts.



Let the LLM take control of your terminal/computer.
To get an introduction to using the terminal, check this recording.
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
We focus on predictive not generative AI in this course.
Our school has two new courses starting in 2026:
This course focuses on neural networks for supervised learning, and these techniques are the fundamental building blocks for all the others parts of modern AI.

New ones:
Source: Kaggle, Getting Started.
Regression:
Classification:
A recipe for supervised learning.
Source: Matthew Gormley (2021), Introduction to Machine Learning Lecture Slides, Slide 67.
Data which ‘labels itself’. Example: language model.

Source: Amit Chaudhary (2020), Self Supervised Representation Learning in NLP.


Other examples: image inpainting, denoising images.
Test image: Eileen Collins (NASA, public domain).
A deoldified version of the famous “Migrant Mother” photograph.
Source: Deoldify package.
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping

A neuron in a neural network with a ReLU activation.
Source: Marcus Lautier (2022).
\begin{aligned} z~=~&x_1 \times w_1 + \\ &x_2 \times w_2 + \\ &x_3 \times w_3 . \end{aligned}
a = \begin{cases} z & \text{if } z > 0 \\ 0 & \text{if } z \leq 0 \end{cases}
Here, x_1, x_2, x_3 are just some fixed data.
The weights w_1, w_2, w_3 should be ‘learned’.

Source: Marcus Lautier (2022).
A basic fully-connected/dense network.
Source: Marcus Lautier (2022).
Brains and computers are binary, so make a perceptron with binary data. Seemed reasonable, impossible to train.
Replace binary state with continuous state. Still rather slow to train.
Note
It’s a neural network made of neurons, not a “neuron network”.

One can show that an MLP is a universal approximator, meaning it can model any suitably smooth function, given enough hidden units, to any desired level of accuracy (Hornik 1991). One can either make the model be “wide” or “deep”; the latter has some advantages…
— Murphy (2012, p. 566)


Sources: Marcus Lautier (2022) & Fenjiro (2019), Face Id: Deep Learning for Face Recognition, Medium.
In this ANN, how many of the following are there:
What is the depth?

Source: Dertat (2017), Applied Deep Learning - Part 1: Artificial Neural Networks, Medium.
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
import random
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpointThe target variable is the median house value for California districts, expressed in $100,000’s. This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

Source: Scikit-learn documentation.
MedInc median income in block groupHouseAge median house age in block groupAveRooms average number of rooms per householdAveBedrms average # of bedrooms per householdPopulation block group populationAveOccup average number of household membersLatitude block group latitudeLongitude block group longitudeMedHouseVal median house value (target)Source: Scikit-learn documentation.
| MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|
| 0 | 8.3252 | 41.0 | 6.984127 | 1.023810 | 322.0 | 2.555556 | 37.88 | -122.23 |
| 1 | 8.3014 | 21.0 | 6.238137 | 0.971880 | 2401.0 | 2.109842 | 37.86 | -122.22 |
| 2 | 7.2574 | 52.0 | 8.288136 | 1.073446 | 496.0 | 2.802260 | 37.85 | -122.24 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 20637 | 1.7000 | 17.0 | 5.205543 | 1.120092 | 1007.0 | 2.325635 | 39.43 | -121.22 |
| 20638 | 1.8672 | 18.0 | 5.329513 | 1.171920 | 741.0 | 2.123209 | 39.43 | -121.32 |
| 20639 | 2.3886 | 16.0 | 5.254717 | 1.162264 | 1387.0 | 2.616981 | 39.37 | -121.24 |
20640 rows × 8 columns
(12384, 8) (4128, 8) (4128, 8)
Refit the linear regression from earlier; we’ll compare neural networks against this baseline.
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
Keras is a common way of specifying, training, and using neural networks. It gives a simple interface to various backend libraries, including PyTorch.
The Keras application programming interface (API)
Source: Melissa Renard (2025)
Decide on the architecture: a simple fully-connected network with one hidden layer with 30 neurons.
Create the model:
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 30) │ 270 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 1) │ 31 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 301 (1.18 KB)
Trainable params: 301 (1.18 KB)
Non-trainable params: 0 (0.00 B)
array([[-139.05],
[ -84.57],
[ -5.82]], dtype=float32)
random.seed(2026)
model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])
display(model.predict(X_val.head(3), verbose=0))
random.seed(2026)
model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])
display(model.predict(X_val.head(3), verbose=0))array([[467.02],
[289.13],
[ 20.93]], dtype=float32)
array([[467.02],
[289.13],
[ 20.93]], dtype=float32)
CPU times: user 2.76 s, sys: 201 ms, total: 2.96 s
Wall time: 2.82 s
[1095.7493896484375,
6.3134589195251465,
4.662435531616211,
3.2442092895507812,
1.773606300354004]
array([[ 1.74],
[-0.83],
[ 1.77]], dtype=float32)
Note
The .predict gives us a ‘matrix’ not a ‘vector’. Calling .flatten() will convert it to a ‘vector’.


Some predictions are negative:
(np.float32(-7.2446113), np.float32(4.038248))
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
Min prediction: -6.00
Max prediction: 8.25

CPU times: user 27.5 s, sys: 1.85 s, total: 29.4 s
Wall time: 28 s
[973.53 5.64 5.64 5.64 5.64] ... [5.64 5.64 5.64 5.64 5.64]


CPU times: user 2.78 s, sys: 187 ms, total: 2.97 s
Wall time: 2.82 s
[nan, nan, nan, nan, nan]
The polynomial regression used by researchers who first studied this dataset.
Note
Fitting \ln(\text{Median Value}) is mathematically identical to the exponential activation function in the final layer (but metrics are in different units).
Source: Pace and Barry (1997), Sparse Spatial Autoregressions, Statistics & Probability Letters.
That basic model gets R^2 of 0.61, but their fancy model gets 0.86.
Source: Pace and Barry (1997), Sparse Spatial Autoregressions, Statistics & Probability Letters.

I’d previously given it the CSV of the data.

Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
CPU times: user 26.9 s, sys: 1.78 s, total: 28.7 s
Wall time: 27.3 s
Note the use of X_train_sc instead of X_train.
Min prediction: 0.00
Max prediction: 16.96
On training data:
{'Linear Regression': 0.5291948207479792,
'Basic ANN': 1.3506622360084013,
'Long run ANN': 0.6312395755307988,
'Exp ANN': 0.34680377119156724}
On validation data (expect worse, i.e. bigger):
Lecture Outline
Artificial Intelligence
Deep Learning Successes
Types of Machine Learning Tasks
Neural Networks
California House Price Prediction
Our First Neural Network
Force Positive Predictions
Preprocessing
Early Stopping
Illustrative loss curves over time.
Source: Heaton (2022), Applications of Deep Learning, Part 3.4: Early Stopping.
Hinton calls it a “beautiful free lunch”
random.seed(2026)
model = Sequential([
Dense(30, activation="leaky_relu"),
Dense(1, activation="exponential")
])
model.compile("adam", "mse")
es = EarlyStopping(restore_best_weights=True, patience=15)
%time hist = model.fit(X_train_sc, y_train, epochs=1_000, \
callbacks=[es], validation_data=(X_val_sc, y_val), verbose=0)
print(f"Keeping model at epoch #{len(hist.history['loss'])-15}.")CPU times: user 49 s, sys: 3.6 s, total: 52.6 s
Wall time: 49.9 s
Keeping model at epoch #57.


| Model | MSE | |
|---|---|---|
| 1 | Basic ANN | 1.322504 |
| 2 | Long run ANN | 0.629846 |
| 0 | Linear Regression | 0.505942 |
| 3 | Exp ANN | 0.380793 |
| 4 | Early stop ANN | 0.323157 |
Evaluate only the final/selected model on the test set.
compile: specify the loss function and optimiserfit: learn the parameters of the modelpredict: apply the modelevaluate: apply the model and calculate a metricPython implementation: CPython
Python version : 3.14.5
IPython version : 9.13.0
keras : 3.14.1
matplotlib: 3.10.9
numpy : 2.4.4
pandas : 3.0.2
seaborn : 0.13.2
scipy : 1.17.1
torch : 2.11.0
