Entity Embedding

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Author

Patrick Laub

Show the package imports

import random
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression
from sklearn import set_config

set_config(transform_output="pandas")

Entity Embedding

Revisit the French motor dataset

Code

from pathlib import Path
from sklearn.datasets import fetch_openml

if not Path("french-motor.csv").exists():
    freq = fetch_openml(data_id=41214, as_frame=True).frame
    freq.to_csv("french-motor.csv", index=False)
else:
    freq = pd.read_csv("french-motor.csv")

freq

	IDpol	ClaimNb	Exposure	Area	VehPower	VehAge	DrivAge	BonusMalus	VehBrand	VehGas	Density	Region
0	1.0	1	0.10000	D	5	0	55	50	B12	'Regular'	1217	R82
1	3.0	1	0.77000	D	5	0	55	50	B12	'Regular'	1217	R82
...	...	...	...	...	...	...	...	...	...	...	...	...
678011	6114329.0	0	0.00274	B	4	0	60	50	B12	'Regular'	95	R26
678012	6114330.0	0	0.00274	B	7	6	29	54	B12	'Diesel'	65	R72

678013 rows × 12 columns

Data dictionary

Variable	Description	Preprocessing
`IDpol`	Policy number (unique identifier)	Dropped
`ClaimNb`	Number of claims on the given policy	Target
`Exposure`*	Total exposure in yearly units	Normalised
`Area`	Area code (ordinal)	Ordinal Encode
`VehPower`	Power of the car (ordinal encoded)	Normalised
`VehAge`	Age of the car in years	Normalised
`DrivAge`	Age of the (most common) driver in years	Normalised
`BonusMalus`	Bonus–malus level between 50 and 230 (with reference level 100)	Normalised
`VehBrand`*	Car brand (nominal)	One-hot
`VehGas`	Diesel or regular fuel car (binary)	One-hot
`Density`	Density of inhabitants per km² in the city of the living place of the driver	Normalised
`Region`*	Regions in France (prior to 2016)	One-hot

The model

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Assume the distribution Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i))

We have \mathbb{E} Y_i = \lambda(\mathbf{x}_i). The NN takes \mathbf{x}_i & predicts \mathbb{E} Y_i.

Note

For insurance, this is a bit weird. The exposures are different for each policy.

\lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Normally, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the expected rate per year, then Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

What values do we see in the data?

Code

freq = freq.drop("IDpol", axis=1).head(25_000)

X_train, X_test, y_train, y_test = train_test_split(
  freq.drop("ClaimNb", axis=1), freq["ClaimNb"], random_state=36861)

# Reset each index to start at 0 again.
X_train_raw = X_train.reset_index(drop=True)
X_test_raw = X_test.reset_index(drop=True)

X_train_raw["Area"].value_counts()
X_train_raw["VehBrand"].value_counts()
X_train_raw["VehGas"].value_counts()
X_train_raw["Region"].value_counts()

Area
C    5514
D    4116
     ... 
B    2387
F     444
Name: count, Length: 6, dtype: int64

VehBrand
B1     4998
B2     4906
       ... 
B11     283
B14     140
Name: count, Length: 11, dtype: int64

VehGas
'Regular'    10658
'Diesel'      8092
Name: count, dtype: int64

Region
R24    6493
R82    2112
       ... 
R42      48
R43      26
Name: count, Length: 22, dtype: int64

How we preprocessed last time

from sklearn.compose import make_column_transformer

ct = make_column_transformer(
  (OneHotEncoder(sparse_output=False, drop="first"), ["VehGas", "VehBrand", "Region"]),
  (OrdinalEncoder(), ["Area"]),
  remainder=StandardScaler(),
  verbose_feature_names_out=False
)
X_train = ct.fit_transform(X_train_raw)

X_train_raw.head(3)

	Exposure	Area	VehPower	VehAge	DrivAge	BonusMalus	VehBrand	VehGas	Density	Region
0	1.00	A	7	8	50	52	B2	'Diesel'	13	R24
1	0.79	B	7	7	28	80	B12	'Diesel'	65	R21
2	1.00	C	6	13	30	50	B1	'Regular'	133	R53

X_train.head(3)

	VehGas_'Regular'	VehBrand_B12	VehBrand_B2	...	Area	Exposure	VehPower	VehAge	DrivAge	BonusMalus	Density
0	0.0	0.0	1.0	...	0.0	1.129272	0.366510	0.223226	0.374405	-0.524020	-0.394690
1	0.0	1.0	0.0	...	1.0	0.566087	0.366510	0.046100	-1.131699	1.122382	-0.381092
2	1.0	0.0	0.0	...	2.0	1.129272	-0.167408	1.108854	-0.994781	-0.641620	-0.363309

3 rows × 39 columns

Categorical Variables & Entity Embeddings

Region column

One-hot encoding

oh = OneHotEncoder(sparse_output=False)
X_train_oh = oh.fit_transform(X_train_raw[["Region"]])
X_test_oh = oh.transform(X_test_raw[["Region"]])
print(list(X_train_raw["Region"][:5]))
X_train_oh.head()

['R24', 'R21', 'R53', 'R24', 'R82']

	Region_R11	Region_R21	Region_R22	Region_R23	Region_R24	Region_R25	Region_R26	Region_R31	Region_R41	Region_R42	...	Region_R53	Region_R54	Region_R72	Region_R73	Region_R74	Region_R82	Region_R83	Region_R91	Region_R93	Region_R94
0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0

5 rows × 22 columns

One hot encoding is a way to assign numerical values to nominal variables. One hot encoding is different from ordinal encoding in the way in which it transforms the data. Ordinal encoding assigns a numerical integer to each unique category of the data column and returns one integer column. In contrast, one hot encoding returns a binary vector for each unique category. As a result, what we get from one hot encoding is not a single column vector, but a matrix with number of columns equal to the number of unique categories in that nominal data column.

Train on one-hot inputs

For the sake of explaining entity embeddings, we will train a neural network just on one categorical variable which is one-hot encoded.

1num_regions = len(oh.categories_[0])

random.seed(12)
2model = Sequential([
  Dense(2, input_dim=num_regions),
  Dense(1, activation="exponential")
])

3model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_oh, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])                       
hist.history["val_loss"][-1]

1: Computes the number of unique categories in the encoded column and store it in num_regions
2: Constructs the neural network. This time, it is a neural network with 1 hidden layer and 1 output layer. Dense(2, input_dim=num_regions) takes in an input matrix of with columns = num_regions and transofrmas it down to an output with 2 neurons
3: Steps 3-6 are similar to what we saw during training with ordinal encoded variables

/Users/z3535837/miniforge3/envs/ai/lib/python3.11/site-packages/keras/src/layers/core/dense.py:87: UserWarning:

Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.

Epoch 7: early stopping

0.7678562998771667

Make a fake batch of data

X = np.eye(num_regions)
pd.DataFrame(X, columns=oh.categories_[0])

	R11	R21	R22	R23	R24	R25	R26	R31	R41	R42	...	R53	R54	R72	R73	R74	R82	R83	R91	R93	R94
0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0
21	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0

22 rows × 22 columns

model.layers[0](X)

<tf.Tensor: shape=(22, 2), dtype=float32, numpy=
array([[-0.2 , -0.12],
       [ 0.18, -0.19],
       [-0.21,  0.11],
       [-0.82,  0.11],
       [-0.03, -0.68],
       [-0.69, -0.17],
       [-0.32, -0.37],
       [ 0.24, -0.  ],
       [-0.9 , -0.54],
       [ 0.25, -0.36],
       [-0.26, -0.06],
       [-1.11, -0.32],
       [ 0.17, -0.68],
       [-0.94, -0.58],
       [-0.14,  0.05],
       [ 0.1 ,  0.  ],
       [-0.46, -0.37],
       [-0.59, -0.34],
       [-0.4 , -0.46],
       [-0.19,  0.18],
       [ 0.32, -0.14],
       [-0.3 ,  0.35]], dtype=float32)>

The first layer

We can also extract the layer, get its wieghts and compute manually.

1layer = model.layers[0]
2W, b = layer.get_weights()
3X.shape, W.shape, b.shape

1: Extracts the layer
2: Gets the weights and biases and stores the weights in W and biases in b
3: Returns the shapes of the matrices

((22, 22), (22, 2), (2,))

X @ W + b

array([[-0.2 , -0.12],
       [ 0.18, -0.19],
       [-0.21,  0.11],
       [-0.82,  0.11],
       [-0.03, -0.68],
       [-0.69, -0.17],
       [-0.32, -0.37],
       [ 0.24, -0.  ],
       [-0.9 , -0.54],
       [ 0.25, -0.36],
       [-0.26, -0.06],
       [-1.11, -0.32],
       [ 0.17, -0.68],
       [-0.94, -0.58],
       [-0.14,  0.05],
       [ 0.1 ,  0.  ],
       [-0.46, -0.37],
       [-0.59, -0.34],
       [-0.4 , -0.46],
       [-0.19,  0.18],
       [ 0.32, -0.14],
       [-0.3 ,  0.35]])

W + b

array([[-0.2 , -0.12],
       [ 0.18, -0.19],
       [-0.21,  0.11],
       [-0.82,  0.11],
       [-0.03, -0.68],
       [-0.69, -0.17],
       [-0.32, -0.37],
       [ 0.24, -0.  ],
       [-0.9 , -0.54],
       [ 0.25, -0.36],
       [-0.26, -0.06],
       [-1.11, -0.32],
       [ 0.17, -0.68],
       [-0.94, -0.58],
       [-0.14,  0.05],
       [ 0.1 ,  0.  ],
       [-0.46, -0.37],
       [-0.59, -0.34],
       [-0.4 , -0.46],
       [-0.19,  0.18],
       [ 0.32, -0.14],
       [-0.3 ,  0.35]], dtype=float32)

The above codes manually compute and returns the same answers as before.

Just a look-up operation

display(list(oh.categories_[0]))

['R11',
 'R21',
 'R22',
 'R23',
 'R24',
 'R25',
 'R26',
 'R31',
 'R41',
 'R42',
 'R43',
 'R52',
 'R53',
 'R54',
 'R72',
 'R73',
 'R74',
 'R82',
 'R83',
 'R91',
 'R93',
 'R94']

W + b

array([[-0.2 , -0.12],
       [ 0.18, -0.19],
       [-0.21,  0.11],
       [-0.82,  0.11],
       [-0.03, -0.68],
       [-0.69, -0.17],
       [-0.32, -0.37],
       [ 0.24, -0.  ],
       [-0.9 , -0.54],
       [ 0.25, -0.36],
       [-0.26, -0.06],
       [-1.11, -0.32],
       [ 0.17, -0.68],
       [-0.94, -0.58],
       [-0.14,  0.05],
       [ 0.1 ,  0.  ],
       [-0.46, -0.37],
       [-0.59, -0.34],
       [-0.4 , -0.46],
       [-0.19,  0.18],
       [ 0.32, -0.14],
       [-0.3 ,  0.35]], dtype=float32)

Turn the region into an index

oe = OrdinalEncoder()
X_train_reg = oe.fit_transform(X_train_raw[["Region"]])
X_test_reg = oe.transform(X_test_raw[["Region"]])

for i, reg in enumerate(oe.categories_[0][:3]):
  print(f"The Region value {reg} gets turned into {i}.")

The Region value R11 gets turned into 0.
The Region value R21 gets turned into 1.
The Region value R22 gets turned into 2.

Use an Embedding layer

from keras.layers import Embedding
num_regions = X_train_raw["Region"].nunique()

random.seed(12)
model = Sequential([
  Embedding(input_dim=num_regions, output_dim=2),
  Dense(1, activation="exponential")
])

model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(verbose=True)
hist = model.fit(X_train_reg, y_train, epochs=100, verbose=0,
    validation_split=0.2, callbacks=[es])
hist.history["val_loss"][-1]

Epoch 7: early stopping

0.7678869366645813

model.layers

[<Embedding name=embedding, built=True>, <Dense name=dense_2, built=True>]

Embedding layer can learn the optimal representation for a category of a categorical variable, during training. In the above example, encoding the variable Region using ordinal encoding and passing it through an embedding layer learns the optimal representation for the region during training. Ordinal encoding followed with an embedding layer is a better alternative to one-hot encoding. It is computationally less expensive (compared to generating large matrices in one-hot encoding) particularly when the number of categories is high.

Keras’ Embedding Layer

model.layers[0].get_weights()[0]

array([[-0.11, -0.1 ],
       [ 0.04,  0.  ],
       [-0.01,  0.02],
       [-0.24, -0.13],
       [-0.31, -0.35],
       [-0.33, -0.25],
       [-0.28, -0.25],
       [ 0.12,  0.08],
       [-0.6 , -0.5 ],
       [-0.01, -0.06],
       [-0.09, -0.06],
       [-0.58, -0.44],
       [-0.24, -0.29],
       [-0.67, -0.56],
       [-0.01,  0.01],
       [ 0.07,  0.05],
       [-0.35, -0.31],
       [-0.38, -0.32],
       [-0.3 , -0.28],
       [ 0.04,  0.07],
       [ 0.09,  0.03],
       [ 0.07,  0.13]], dtype=float32)

X_train_raw["Region"].head(4)

0    R24
1    R21
2    R53
3    R24
Name: Region, dtype: object

X_sample = X_train_reg[:4].to_numpy()
X_sample

array([[ 4.],
       [ 1.],
       [12.],
       [ 4.]])

enc_tensor = model.layers[0](X_sample)
keras.ops.convert_to_numpy(enc_tensor).squeeze()

array([[-0.31, -0.35],
       [ 0.04,  0.  ],
       [-0.24, -0.29],
       [-0.31, -0.35]], dtype=float32)

The learned embeddings

If we only have two-dimensional embeddings we can plot them.

points = model.layers[0].get_weights()[0]
plt.scatter(points[:,0], points[:,1])
for i in range(num_regions):
  plt.text(points[i,0]+0.01, points[i,1] , s=oe.categories_[0][i])

While it not always the case, entity embeddings can at times be meaningful instead of just being useful representations. The above figure shows how plotting the learned embeddings help reveal regions which might be similar (e.g. coastal areas, hilly areas etc.).

Entity embeddings

Embeddings will gradually improve during training.

Embeddings & other inputs

Often times, we deal with both categorical and numerical variables together. The following diagram shows a recommended way of inputting numerical and categorical data in to the neural network. Numerical variables are inherently numeric hence, do not require entity embedding. On the other hand, categorical variables must undergo entity embedding to convert to number format.

Illustration of a neural network with both continuous and categorical inputs.

We can’t do this with Sequential models…

Keras’ Functional API

Sequential models are easy to use and do not require many specifications, however, they cannot model complex neural network architectures. Keras Functional API approach on the other hand allows the users to build complex architectures.

Converting Sequential models

from keras.models import Model
from keras.layers import Input

random.seed(12)

model = Sequential([
  Dense(30, "leaky_relu"),
  Dense(1, "exponential")
])

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_oh, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]

0.7700941562652588

random.seed(12)

inputs = Input(shape=(X_train_oh.shape[1],))
x = Dense(30, "leaky_relu")(inputs)
out = Dense(1, "exponential")(x)
model = Model(inputs, out)

model.compile(
  optimizer="adam",
  loss="poisson")

hist = model.fit(
  X_train_oh, y_train,
  epochs=1, verbose=0,
  validation_split=0.2)
hist.history["val_loss"][-1]

0.7700941562652588

See one-length tuples.

The above code shows how to construct the same neural network using sequential models and Keras functional API. There are some differences in the construction. In the functional API approach, we must specify the shape of the input layer, and explicitly define the inputs and outputs of a layer. model = Model(inputs, out) function specifies the input and output of the model. This manner of specifying the inputs and outputs of the model allow the user to combine several inputs (inputs which are preprocessed in different ways) to finally build the model. One example would be combining entity embedded categorical variables, and scaled numerical variables.

Wide & Deep network

Add a skip connection from input to output layers.

from keras.layers \
    import Concatenate

inp = Input(shape=X_train.shape[1:])
hidden1 = Dense(30, "leaky_relu")(inp)
hidden2 = Dense(30, "leaky_relu")(hidden1)
concat = Concatenate()(
  [inp, hidden2])
output = Dense(1)(concat)
model = Model(
    inputs=[inp],
    outputs=[output])

Naming the layers

For complex networks, it is often useful to give meaningful names to the layers.

input_ = Input(shape=X_train.shape[1:], name="input")
hidden1 = Dense(30, activation="leaky_relu", name="hidden1")(input_)
hidden2 = Dense(30, activation="leaky_relu", name="hidden2")(hidden1)
concat = Concatenate(name="combined")([input_, hidden2])
output = Dense(1, name="output")(concat)
model = Model(inputs=[input_], outputs=[output])

Inspecting a complex model

from keras.utils import plot_model

plot_model(model, show_layer_names=True)

model.summary(line_length=75)

Model: "functional_5"

┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃   Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input (InputLayer)  │ (None, 39)        │         0 │ -                 │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden1 (Dense)     │ (None, 30)        │     1,200 │ input[0][0]       │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ hidden2 (Dense)     │ (None, 30)        │       930 │ hidden1[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ combined            │ (None, 69)        │         0 │ input[0][0],      │
│ (Concatenate)       │                   │           │ hidden2[0][0]     │
├─────────────────────┼───────────────────┼───────────┼───────────────────┤
│ output (Dense)      │ (None, 1)         │        70 │ combined[0][0]    │
└─────────────────────┴───────────────────┴───────────┴───────────────────┘

 Total params: 2,200 (8.59 KB)

 Trainable params: 2,200 (8.59 KB)

 Non-trainable params: 0 (0.00 B)

French Motor Dataset with Embeddings

The desired architecture

Preprocess all French motor inputs

Transform the categorical variables to integers:

1num_brands, num_regions = X_train_raw[["VehBrand", "Region"]].nunique()

ct = make_column_transformer(
2  (OrdinalEncoder(), ["VehBrand", "Region", "Area", "VehGas"]),
3  remainder=StandardScaler(),
4  verbose_feature_names_out=False
)
5X_train = ct.fit_transform(X_train_raw)
6X_test = ct.transform(X_test_raw)

1: Stores separately the number of unique categorical in the nominal variables, as would require these values later for entity embedding
2: Contructs columns transformer by first ordinally encoding all categorical variables (ordinal and nominal). Nominal variables are ordinal encoded here just as an intermediate step before this is the required input format for entity embedding layers
3: Applies standard scaling to all other numerical variables
4: Choose the simpler style of column names for the transformed dataframes
5: Fits the column transformer to the train set and transforms it
6: Transforms the test set using the column transformer fitted using the train set

Split the brand and region data apart from the rest:

X_train_brand = X_train["VehBrand"]
X_train_region = X_train["Region"]
X_train_rest = X_train.drop(["VehBrand", "Region"], axis=1)

X_test_brand = X_test["VehBrand"]
X_test_region = X_test["Region"]
X_test_rest = X_test.drop(["VehBrand", "Region"], axis=1)

Organise the inputs

Make a Keras Input for: vehicle brand, region, & others.

veh_brand = Input(shape=(1,), name="veh_brand")
region = Input(shape=(1,), name="region")
other_inputs = Input(shape=X_train_rest.shape[1:], name="other_inputs")

Create embeddings and join them with the other inputs.

1from keras.layers import Reshape

random.seed(1337)
2veh_brand_ee = Embedding(input_dim=num_brands, output_dim=2,
    name="veh_brand_ee")(veh_brand)                                
3veh_brand_ee = Reshape(target_shape=(2,))(veh_brand_ee)

4region_ee = Embedding(input_dim=num_regions, output_dim=2,
    name="region_ee")(region)
5region_ee = Reshape(target_shape=(2,))(region_ee)

6x = Concatenate(name="combined")([veh_brand_ee, region_ee, other_inputs])

1: Imports Reshape class from keras.layers library
2: Constructs the embedding layer by specifying the input dimension (the number of unique categories) and output dimension (the number of dimensions we want the input to be summarised in to)
3: Reshapes the output to match the format required at the model building step
4: Constructs the embedding layer by specifying the input dimension (the number of unique categories) and output dimension
5: Reshapes the output to match the format required at the model building step
6: Combines the entity embedded matrices and other inputs together

Complete the model and fit it

Feed the combined embeddings & continuous inputs to some normal dense layers.

x = Dense(30, "relu", name="hidden")(x)
out = Dense(1, "exponential", name="out")(x)

1model = Model([veh_brand, region, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

2hist = model.fit((X_train_brand, X_train_region, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[EarlyStopping(patience=5)], validation_split=0.2)
np.min(hist.history["val_loss"])

1: Model building stage requires all inputs to be passed in together
2: Passes in the three sets of data, since the format defined at the model building stage requires 3 data sets

np.float64(0.6845806837081909)

Plotting this model

plot_model(model, show_layer_names=True)

Why we need to reshape

plot_model(model, show_layer_names=True, show_shapes=True)

The plotted model shows how, for example, region starts off as a matrix with (None,1) shape. This indicates that, region was a column matrix with some number of rows. Entity embedding the region variable resulted in a 3D array of shape ((None,1,2)) which is not the required format for concatenating. Therefore, we reshape it using the Reshape function. This results in column array of shape, (None,2) which is what we need for concatenating.

Scale By Exposure

Two different models

Have \{ (\mathbf{x}_i, y_i) \}_{i=1, \dots, n} for \mathbf{x}_i \in \mathbb{R}^{47} and y_i \in \mathbb{N}_0.

Model 1: Say Y_i \sim \mathsf{Poisson}(\lambda(\mathbf{x}_i)).

But, the exposures are different for each policy. \lambda(\mathbf{x}_i) is the expected number of claims for the duration of policy i’s contract.

Model 2: Say Y_i \sim \mathsf{Poisson}(\text{Exposure}_i \times \lambda(\mathbf{x}_i)).

Now, \text{Exposure}_i \not\in \mathbf{x}_i, and \lambda(\mathbf{x}_i) is the rate per year.

Just take continuous variables

For convenience, following code only considers the numerical variables during this implementation.

1ct = make_column_transformer(
2  ("passthrough", ["Exposure"]),
3  ("drop", ["VehBrand", "Region", "Area", "VehGas"]),
4  remainder=StandardScaler(),
5  verbose_feature_names_out=False
)
6X_train = ct.fit_transform(X_train_raw)
7X_test = ct.transform(X_test_raw)

1: Starts defining the column transformer
2: Lets Exposure passthrough the neural network as it is without peprocessing
3: Drops the categorical variables (for the ease of implementation)
4: Scales the remaining variables
5: Choose the simpler style of column names for the transformed dataframes
6: Fits and transforms the train set
7: Only transforms the test set

Split exposure apart from the rest:

X_train_exp = X_train["Exposure"]
X_test_exp = X_test["Exposure"]
X_train_rest = X_train.drop("Exposure", axis=1)
X_test_rest = X_test.drop("Exposure", axis=1)

Organise the inputs:

exposure = Input(shape=(1,), name="exposure")
other_inputs = Input(shape=X_train_rest.shape[1:], name="other_inputs")

Make & fit the model

Feed the continuous inputs to some normal dense layers.

random.seed(1337)
x = Dense(30, "relu", name="hidden1")(other_inputs)
x = Dense(30, "relu", name="hidden2")(x)
lambda_ = Dense(1, "exponential", name="lambda")(x)

out = lambda_ * exposure # In past, need keras.layers.Multiply()[lambda_, exposure]
model = Model([exposure, other_inputs], out)
model.compile(optimizer="adam", loss="poisson")

es = EarlyStopping(patience=10, restore_best_weights=True, verbose=1)
hist = model.fit((X_train_exp, X_train_rest),
    y_train, epochs=100, verbose=0,
    callbacks=[es], validation_split=0.2)
np.min(hist.history["val_loss"])

Epoch 74: early stopping
Restoring model weights from the end of the best epoch: 64.

np.float64(0.9126634001731873)

Plot the model

plot_model(model, show_layer_names=True)

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))

Python implementation: CPython
Python version       : 3.11.12
IPython version      : 9.3.0

keras     : 3.8.0
matplotlib: 3.10.0
numpy     : 2.0.2
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.15.3
torch     : 2.6.0+cu124
tensorflow: 2.18.0
tf_keras  : 2.18.0

Glossary

entity embeddings
Input layer
Keras functional API

Reshape layer
skip connection
wide & deep network