AI & Deep Learning

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Patrick Laub

Artificial Intelligence

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Different goals of AI

Artificial intelligence describes an agent which is capable of:

Thinking humanly Thinking rationally
Acting humanly Acting rationally


Put another way, what fields are most important to AI?

Cognitive science, psychology Mathematics, logic and inference
HCI, linguistics and robotics Computer science, statistics


  • Actions are simpler to work with than thought: How do humans even think?
  • Acting humanly can be done without intelligence, see ChatGPT
  • The focus on actions: delivers results, but in a black-box manner

The rational behaviour paradigm won out

“For these reasons, the rational-agent approach to AI has prevailed throughout most of the field’s history. In the early decades, rational agents were built on logical foundations and formed definite plans to achieve specific goals. Later, methods based on probability theory and machine learning allowed the creation of agents that could make decisions under I uncertainty to attain the best expected outcome. In a nutshell, AI has focused on the study and construction of agents that do the right thing.”

Russell & Norvig (2021, p. 22)

The traditional AI text.

Question: When do you think the term “machine learning” was first coined?

Shakey the Robot (~1966 – 1972)

Shakey the Robot

The minimax algorithm

The minimax algorithm for chess.

Pseudocode for the minimax algorithm.

Chess

Deep Blue (1997)

Gary Kasparov playing Deep Blue.

Cartoon of the match.

Machine Learning

Tried making a computer smart, too hard!

Make a computer that can learn to be smart.

“[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed”

Samuel (1959)

AI eventually become dominated by one approach, called machine learning, which itself is now dominated by deep learning (neural networks).

You can study a 12 week course on AI and never touch on machine learning…

Artificial Intelligence, Machine Learning, and Deep Learning.

Deep Learning Successes

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Image Classification I

What is this?

Options:

  1. punching bag
  2. goblet
  3. red wine
  4. hourglass
  5. balloon

Image Classification II

What is this?

Options:

  1. sea urchin
  2. porcupine
  3. echidna
  4. platypus
  5. quill

Image Classification III

What is this?

Options:

  1. dingo
  2. malinois
  3. German shepherd
  4. muzzle
  5. kelpie

ImageNet Challenge (2012)

ImageNet and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC); originally 1,000 synsets.

AlexNet — a neural network developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton — won the ILSVRC 2012 challenge convincingly.

Needed a graphics card

A graphics processing unit (GPU)

A PC with the GPU and CPU marked in red and blue.

4.2. Training on multiple GPUs A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs.”

Krizhevsky et al. (2012)

Lee Sedol plays AlphaGo (2016)

Deep Blue was a win for AI, AlphaGo a win for DL.

Lee Sedol playing AlphaGo AI

I highly recommend this documentary about the event.

Generative Adversarial Networks (2014)

https://thispersondoesnotexist.com/

A GAN-generated face

A GAN-generated face

Diffusion models (2020)

Painting of avocado skating while wearing a hoodie

A surrealist painting of an alpaca studying for an exam

ChatGPT (2022)

AI predictions in the ImageNet demo were from ChatGPT code.

Test ChatGPT’s ability to:

  • generate images
  • translate code
  • explain code
  • run code
  • analyse a dataset
  • critique code
  • critique writing
  • voice chat with you

Compare to Copilot.

GitHub Copilot (2022)

You can get extra usage for free with GitHub Education for Students

Reasoning models (2024)

Reasoning is basically automating the old trick of adding “do this step-by-step” to your prompts.

Choosing the Pro model.

Setting the thinking effort to “Extended”.

Using the larger language models

  • Zoom out as far as possible
  • Give it as much relevant context as possible
  • Better if it’s easy to ingest (LaTeX or Python/R code), otherwise it has to convert to text
  • A simple instruction in the prompt is enough
  • Context sizes for the top models have become quite long
  • I’ve had the best results when it reasons for 20–30 minutes; then I review each potential issue it finds.

A typical prompt I use.

Claude Code (2025)

Let the LLM take control of your terminal/computer.

To get an introduction to using the terminal, check this recording.

Types of Machine Learning Tasks

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Predictive versus generative

We focus on predictive not generative AI in this course.

Our school has two new courses starting in 2026:

  • ACTL4307 “Generative AI for Actuaries”
  • ACTL4306 “Quantitative Ethical AI for Risk & Actuarial Applications”


This course focuses on neural networks for supervised learning, and these techniques are the fundamental building blocks for all the others parts of modern AI.

A taxonomy of problems

Machine learning categories in ACTL3142.

New ones:

  • Reinforcement learning
  • Semi-supervised learning
  • Active learning

Examples of supervised learning

Regression:

  • Given policy \hookrightarrow predict the rate of claims.
  • Given policy \hookrightarrow predict claim severity.
  • Given a reserving triangle \hookrightarrow predict future claims.

Classification:

  • Given a claim \hookrightarrow classify as fraudulent or not.
  • Given a customer \hookrightarrow predict customer retention patterns.

Supervised learning: mathematically

A recipe for supervised learning.

Self-supervised learning

Data which ‘labels itself’. Example: language model.

‘Autoregressive’ (e.g. GPT) versus ‘masked’ model (e.g. BERT).

Example: image super-resolution

Original image: it is now the target

Downscaled to a lower resolution: it is now the input

Other examples: image inpainting, denoising images.

Example: Deoldify images

A deoldified version of the famous “Migrant Mother” photograph.

Neural Networks

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

How do real neurons work?

A neuron ‘firing’

An artificial neuron

A neuron in a neural network with a ReLU activation.

One neuron

\begin{aligned} z~=~&x_1 \times w_1 + \\ &x_2 \times w_2 + \\ &x_3 \times w_3 . \end{aligned}

a = \begin{cases} z & \text{if } z > 0 \\ 0 & \text{if } z \leq 0 \end{cases}

Here, x_1, x_2, x_3 are just some fixed data.

The weights w_1, w_2, w_3 should be ‘learned’.

A neuron in a neural network with a ReLU activation.

One neuron with bias

\begin{aligned} z~=~&x_1 \times w_1 + \\ &x_2 \times w_2 + \\ &x_3 \times w_3 + b . \end{aligned}

a = \begin{cases} z & \text{if } z > 0 \\ 0 & \text{if } z \leq 0 \end{cases}

The weights w_1, w_2, w_3 and bias b should be ‘learned’.

A basic neural network

A basic fully-connected/dense network.

Step-function activation

Perceptrons

Brains and computers are binary, so make a perceptron with binary data. Seemed reasonable, impossible to train.

Modern neural network

Replace binary state with continuous state. Still rather slow to train.

Note

It’s a neural network made of neurons, not a “neuron network”.

Try different activation functions

Flexible

One can show that an MLP is a universal approximator, meaning it can model any suitably smooth function, given enough hidden units, to any desired level of accuracy (Hornik 1991). One can either make the model be “wide” or “deep”; the latter has some advantages…

Murphy (2012, p. 566)

Feature engineering

Neural networks can learn how to manipulate the inputs (with enough data). That doesn’t mean deep learning is always the best option!

Quiz

In this ANN, how many of the following are there:

  • features,
  • targets,
  • weights,
  • biases, and
  • parameters?

What is the depth?

An artificial neural network.

California House Price Prediction

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Imports needed for this demo

import random

import numpy as np
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint

Data science always starts with the data!

The target variable is the median house value for California districts, expressed in $100,000’s. This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

Dall-E’s rendition of this dataset.

Columns

  • MedInc median income in block group
  • HouseAge median house age in block group
  • AveRooms average number of rooms per household
  • AveBedrms average # of bedrooms per household
  • Population block group population
  • AveOccup average number of household members
  • Latitude block group latitude
  • Longitude block group longitude
  • MedHouseVal median house value (target)

Import the data

features, target = fetch_california_housing(as_frame=True, return_X_y=True)
features
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24
... ... ... ... ... ... ... ... ...
20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43 -121.22
20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43 -121.32
20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37 -121.24

20640 rows × 8 columns

Train/validation/test split

X_main, X_test, y_main, y_test = train_test_split(
    features, target, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(
    X_main, y_main, test_size=0.25, random_state=1)

num_features = features.shape[1]
print(X_train.shape, X_val.shape, X_test.shape)
(12384, 8) (4128, 8) (4128, 8)

Linear regression baseline

Refit the linear regression from earlier; we’ll compare neural networks against this baseline.

lr = LinearRegression()
lr.fit(X_train, y_train)

mse_lr_train = mean_squared_error(y_train, lr.predict(X_train))
mse_lr_val = mean_squared_error(y_val, lr.predict(X_val))

mse_train = {"Linear Regression": mse_lr_train}
mse_val = {"Linear Regression": mse_lr_val}

Our First Neural Network

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

What are Keras and PyTorch?

Keras is a common way of specifying, training, and using neural networks. It gives a simple interface to various backend libraries, including PyTorch.

The Keras application programming interface (API)

Create a Keras ANN model

Decide on the architecture: a simple fully-connected network with one hidden layer with 30 neurons.

Create the model:

model = Sequential(
    [Input((num_features,)),
     Dense(30, activation="leaky_relu"), 
     Dense(1, activation="leaky_relu")]
)

Inspect the model

model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 30)             │           270 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            31 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 301 (1.18 KB)
 Trainable params: 301 (1.18 KB)
 Non-trainable params: 0 (0.00 B)

The model is initialised randomly

model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])
model.predict(X_val.head(3), verbose=0)
array([[-139.05],
       [ -84.57],
       [  -5.82]], dtype=float32)
model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])
model.predict(X_val.head(3), verbose=0)
array([[-108.21],
       [ -64.74],
       [  -7.1 ]], dtype=float32)

Controlling the randomness

random.seed(2026)

model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])

display(model.predict(X_val.head(3), verbose=0))

random.seed(2026)
model = Sequential([Dense(30, activation="leaky_relu"), Dense(1, activation="leaky_relu")])

display(model.predict(X_val.head(3), verbose=0))
array([[467.02],
       [289.13],
       [ 20.93]], dtype=float32)
array([[467.02],
       [289.13],
       [ 20.93]], dtype=float32)

Fit the model

random.seed(2026)

model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="leaky_relu")
])

model.compile("adam", "mse")
%time hist = model.fit(X_train, y_train, epochs=5, verbose=0)
hist.history["loss"]
CPU times: user 2.76 s, sys: 201 ms, total: 2.96 s
Wall time: 2.82 s
[1095.7493896484375,
 6.3134589195251465,
 4.662435531616211,
 3.2442092895507812,
 1.773606300354004]


Make predictions

y_pred = model.predict(X_train[:3], verbose=0)
y_pred
array([[ 1.74],
       [-0.83],
       [ 1.77]], dtype=float32)

Note

The .predict gives us a ‘matrix’ not a ‘vector’. Calling .flatten() will convert it to a ‘vector’.

print(f"Original shape: {y_pred.shape}")
y_pred = y_pred.flatten()
print(f"Flattened shape: {y_pred.shape}")
y_pred
Original shape: (3, 1)
Flattened shape: (3,)
array([ 1.74, -0.83,  1.77], dtype=float32)

Plot the predictions

Assess the model

y_pred = model.predict(X_val, verbose=0)
mean_squared_error(y_val, y_pred)
1.322503585825187
mse_train["Basic ANN"] = mean_squared_error(
    y_train, model.predict(X_train, verbose=0)
)
mse_val["Basic ANN"] = mean_squared_error(y_val, model.predict(X_val, verbose=0))

Some predictions are negative:

y_pred = model.predict(X_val, verbose=0)
y_pred.min(), y_pred.max()
(np.float32(-7.2446113), np.float32(4.038248))
y_val.min(), y_val.max()
(np.float64(0.225), np.float64(5.00001))

Force Positive Predictions

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Try running for longer

random.seed(2026)

model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="leaky_relu")
])

model.compile("adam", "mse")

%time hist = model.fit(X_train, y_train, epochs=50, verbose=0)
CPU times: user 27.7 s, sys: 1.87 s, total: 29.6 s
Wall time: 28.2 s

Loss curve

plt.plot(range(1, 51), hist.history["loss"])
plt.xlabel("Epoch")
plt.ylabel("MSE");

Loss curve

plt.plot(range(2, 51), hist.history["loss"][1:])
plt.xlabel("Epoch")
plt.ylabel("MSE");

Predictions

y_pred = model.predict(X_val, verbose=0)
print(f"Min prediction: {y_pred.min():.2f}")
print(f"Max prediction: {y_pred.max():.2f}")
Min prediction: -6.00
Max prediction: 8.25
plt.scatter(y_pred, y_val)
plt.xlabel("Predictions")
plt.ylabel("True values")
add_diagonal_line()
mse_train["Long run ANN"] = mean_squared_error(
    y_train, model.predict(X_train, verbose=0)
)
mse_val["Long run ANN"] = mean_squared_error(y_val, model.predict(X_val, verbose=0))

Try different activation functions

Enforce positive outputs (softplus)

random.seed(2026)

model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="softplus")
])

model.compile("adam", "mse")

%time hist = model.fit(X_train, y_train, epochs=50, verbose=0)

losses = np.round(hist.history["loss"], 2)
print(losses[:5], "...", losses[-5:])
CPU times: user 27.5 s, sys: 1.85 s, total: 29.4 s
Wall time: 28 s
[973.53   5.64   5.64   5.64   5.64] ... [5.64 5.64 5.64 5.64 5.64]

Plot the predictions

Enforce positive outputs (\mathrm{e}^{\,x})

random.seed(2026)

model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="exponential")
])

model.compile("adam", "mse")

%time hist = model.fit(X_train, y_train, epochs=5, verbose=0)

losses = hist.history["loss"]
print(losses)
CPU times: user 2.78 s, sys: 187 ms, total: 2.97 s
Wall time: 2.82 s
[nan, nan, nan, nan, nan]

Same as transforming the target

The polynomial regression used by researchers who first studied this dataset.

Note

Fitting \ln(\text{Median Value}) is mathematically identical to the exponential activation function in the final layer (but metrics are in different units).

Good to know others results

That basic model gets R^2 of 0.61, but their fancy model gets 0.86.

GPT can double-check these results

Asking GPT to check it.

I’d previously given it the CSV of the data.

The code it wrote & ran.

Preprocessing

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Re-scaling the inputs

scaler = StandardScaler()
scaler.fit(X_train)

X_train_sc = scaler.transform(X_train)
X_val_sc = scaler.transform(X_val)
X_test_sc = scaler.transform(X_test)
plt.hist(X_train.iloc[:, 0])
plt.hist(X_train_sc[:, 0])
plt.legend(["Original", "Scaled"]);

Same model with scaled inputs

random.seed(2026)

model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="exponential")
])

model.compile("adam", "mse")

%time hist = model.fit(X_train_sc, y_train, epochs=50, verbose=0)
CPU times: user 26.9 s, sys: 1.78 s, total: 28.7 s
Wall time: 27.3 s

Note the use of X_train_sc instead of X_train.

Loss curve

plt.plot(range(1, 51), hist.history["loss"])
plt.xlabel("Epoch")
plt.ylabel("MSE");

Loss curve

plt.plot(range(10, 51), hist.history["loss"][9:])
plt.xlabel("Epoch")
plt.ylabel("MSE");

Predictions

y_pred = model.predict(X_val_sc, verbose=0)
print(f"Min prediction: {y_pred.min():.2f}")
print(f"Max prediction: {y_pred.max():.2f}")
Min prediction: 0.00
Max prediction: 16.96
plt.scatter(y_pred, y_val)
plt.xlabel("Predictions")
plt.ylabel("True values")
add_diagonal_line()
mse_train["Exp ANN"] = mean_squared_error(
    y_train, model.predict(X_train_sc, verbose=0)
)
mse_val["Exp ANN"] = mean_squared_error(y_val, model.predict(X_val_sc, verbose=0))

Comparing MSE (smaller is better)

On training data:

mse_train
{'Linear Regression': 0.5291948207479792,
 'Basic ANN': 1.3506622360084013,
 'Long run ANN': 0.6312395755307988,
 'Exp ANN': 0.34680377119156724}

On validation data (expect worse, i.e. bigger):

mse_val
{'Linear Regression': 0.5059420205381369,
 'Basic ANN': 1.322503585825187,
 'Long run ANN': 0.629845824224333,
 'Exp ANN': 0.38079265031009163}

Comparing models (train)

train_results = pd.DataFrame({"Model": mse_train.keys(), "MSE": mse_train.values()})
train_results.sort_values("MSE", ascending=False)
Model MSE
1 Basic ANN 1.350662
2 Long run ANN 0.631240
0 Linear Regression 0.529195
3 Exp ANN 0.346804

Comparing models (validation)

val_results = pd.DataFrame({"Model": mse_val.keys(), "MSE": mse_val.values()})
val_results.sort_values("MSE", ascending=False)
Model MSE
1 Basic ANN 1.322504
2 Long run ANN 0.629846
0 Linear Regression 0.505942
3 Exp ANN 0.380793

Early Stopping

Lecture Outline

  • Artificial Intelligence

  • Deep Learning Successes

  • Types of Machine Learning Tasks

  • Neural Networks

  • California House Price Prediction

  • Our First Neural Network

  • Force Positive Predictions

  • Preprocessing

  • Early Stopping

Choosing when to stop training

Illustrative loss curves over time.

Try early stopping

Hinton calls it a “beautiful free lunch”

random.seed(2026)
model = Sequential([
    Dense(30, activation="leaky_relu"),
    Dense(1, activation="exponential")
])
model.compile("adam", "mse")

es = EarlyStopping(restore_best_weights=True, patience=15)

%time hist = model.fit(X_train_sc, y_train, epochs=1_000, \
    callbacks=[es], validation_data=(X_val_sc, y_val), verbose=0)
print(f"Keeping model at epoch #{len(hist.history['loss'])-15}.")
CPU times: user 49 s, sys: 3.6 s, total: 52.6 s
Wall time: 49.9 s
Keeping model at epoch #57.

Loss curve

plt.plot(hist.history["loss"])
plt.plot(hist.history["val_loss"])
plt.legend(["Training", "Validation"]);

Loss curve II

plt.plot(hist.history["loss"])
plt.plot(hist.history["val_loss"])
plt.ylim([0, 0.75])
plt.legend(["Training", "Validation"]);

Predictions

Comparing models (validation)

Model MSE
1 Basic ANN 1.322504
2 Long run ANN 0.629846
0 Linear Regression 0.505942
3 Exp ANN 0.380793
4 Early stop ANN 0.323157

The test set

Evaluate only the final/selected model on the test set.

mean_squared_error(y_test, model.predict(X_test_sc, verbose=0))
0.33164115325909604
model.evaluate(X_test_sc, y_test, verbose=0)
0.3316410779953003

Keras model methods

  • compile: specify the loss function and optimiser
  • fit: learn the parameters of the model
  • predict: apply the model
  • evaluate: apply the model and calculate a metric


random.seed(12)
model = Sequential()
model.add(Dense(1, activation="relu"))
model.compile("adam", "poisson")
model.fit(X_train, y_train, verbose=0)
y_pred = model.predict(X_val, verbose=0)
print(model.evaluate(X_val, y_val, verbose=0))
4.4610700607299805

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch"))
Python implementation: CPython
Python version       : 3.14.5
IPython version      : 9.13.0

keras     : 3.14.1
matplotlib: 3.10.9
numpy     : 2.4.4
pandas    : 3.0.2
seaborn   : 0.13.2
scipy     : 1.17.1
torch     : 2.11.0

Glossary

  • activations, activation function
  • artificial neural network
  • biases (in neurons)
  • callbacks
  • classification problem
  • cost/loss function
  • deep network, network depth
  • dense or fully-connected layer
  • early stopping
  • epoch
  • feed-forward neural network
  • hidden layer
  • Keras, TensorFlow, PyTorch
  • labelled/unlabelled data
  • machine learning
  • minimax algorithm
  • neural network architecture
  • perceptron
  • ReLU
  • representation learning
  • sigmoid activation function
  • targets
  • training/validation/test split
  • weights (in a neuron)

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210–229.