Interpretability

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Patrick Laub

Introduction

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

Definitions

“Surprisingly enough, although the concept of interpretability is increasingly widespread, there is no general consensus on both the definition and the measurement of the interpretability of a model.” (Delcaillau et al. 2022)

“Definition Interpret means to explain or to present in understandable terms. In the context of ML systems, we define interpretability as the ability to explain or to present in understandable terms to a human.” (Doshi-Velez and Kim 2017)

A distinction between interpretability and explainability

“Interpretability is about transparency, about understanding exactly why and how the model is generating predictions, and therefore, it is important to observe the inner mechanics of the algorithm considered. This leads to interpreting the model’s parameters and features used to determine the given output. Explainability is about explaining the behavior of the model in human terms.” (Charpentier 2024)

Aspects of Interpretability

Inherent Interpretability: The model is interpretable by design.

Post-hoc Explanations: The model is not interpretable by design, but we can use other methods to explain the model.

Global Interpretability: The ability to understand how the model works.
Local Interpretability: The ability to interpret/understand each prediction.

Husky vs. Wolf

A well-known anecdote in the explainability literature (Ribeiro, Singh, and Guestrin 2016).

Adversarial examples

A demonstration of fast adversarial example generation applied to GoogLeNet on ImageNet. By adding an imperceptibly small vector whose elements are equal to the sign of the elements of the gradient of the cost function with respect to the input, we can change GoogLeNet’s classification of the image (Goodfellow, Shlens, and Szegedy 2014)

Adversarial stickers

Adversarial stickers.

Adversarial text

“TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP”

Demo

LLMs are even more opaque

“Figure 1: An overview of our research from generating to evaluating EmotionPrompt.” (Li et al. 2023)

A popular science article about Ben-Zion et al. (2025)

Illustrative Example

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

First attempt at NLP task

Code

df_raw = pd.read_parquet("../Natural-Language-Processing/NHTSA_NMVCCS_extract.parquet.gzip")

df_raw["NUM_VEHICLES"] = df_raw["NUMTOTV"].map(lambda x: str(x) if x <= 2 else "3+")

weather_cols = [f"WEATHER{i}" for i in range(1, 9)]
features = df_raw[["SUMMARY_EN"] + weather_cols]

target_labels = df_raw["NUM_VEHICLES"]
target = LabelEncoder().fit_transform(target_labels)

X_main, X_test, y_main, y_test = train_test_split(features, target, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_main, y_main, test_size=0.25, random_state=1)

df_raw["SUMMARY_EN"]

0       V1, a 2000 Pontiac Montana minivan, made a lef...
1       The crash occurred in the eastbound lane of a ...
2       This crash occurred just after the noon time h...
                              ...                        
6946    The crash occurred in the eastbound lanes of a...
6947    This single-vehicle crash occurred in a rural ...
6948    This two vehicle daytime collision occurred mi...
Name: SUMMARY_EN, Length: 6949, dtype: object

df_raw["NUM_VEHICLES"].value_counts()\
  .sort_index()

NUM_VEHICLES
1     1822
2     4151
3+     976
Name: count, dtype: int64

Bag of words for the top 1,000 words

Code

def vectorise_dataset(X, vect, txt_col="SUMMARY_EN", dataframe=False):
    X_vects = vect.transform(X[txt_col]).toarray()
    X_other = X.drop(txt_col, axis=1)

    if not dataframe:
        return np.concatenate([X_vects, X_other], axis=1)                           
    else:
        # Add column names and indices to the combined dataframe.
        vocab = list(vect.get_feature_names_out())
        X_vects_df = pd.DataFrame(X_vects, columns=vocab, index=X.index)
        return pd.concat([X_vects_df, X_other], axis=1)

vect = CountVectorizer(max_features=1_000, stop_words="english")
vect.fit(X_train["SUMMARY_EN"])

X_train_bow = vectorise_dataset(X_train, vect)
X_val_bow = vectorise_dataset(X_val, vect)
X_test_bow = vectorise_dataset(X_test, vect)

vectorise_dataset(X_train, vect, dataframe=True).head()

	10	113	...
2532	0	0	...
6209	0	0	...
2561	1	1	...
6664	0	0	...
4214	0	0	...

5 rows × 1008 columns

Trained a basic neural network on that

Code

def build_model(num_features, num_cats):
    random.seed(42)
    
    model = Sequential([
        Input((num_features,)),
        Dense(100, activation="relu"),
        Dense(num_cats, activation="softmax")
    ])
    
    topk = SparseTopKCategoricalAccuracy(k=2, name="topk")
    model.compile("adam", "sparse_categorical_crossentropy",
        metrics=["accuracy", topk])
    
    return model

num_features = X_train_bow.shape[1]
num_cats = df_raw["NUM_VEHICLES"].nunique()
model = build_model(num_features, num_cats)
es = EarlyStopping(patience=1, restore_best_weights=True, monitor="val_accuracy")
model.fit(X_train_bow, y_train, epochs=10,
    callbacks=[es], validation_data=(X_val_bow, y_val), verbose=0)
model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 100)            │       100,900 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 3)              │           303 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 303,611 (1.16 MB)

 Trainable params: 101,203 (395.32 KB)

 Non-trainable params: 0 (0.00 B)

 Optimizer params: 202,408 (790.66 KB)

model.evaluate(X_train_bow, y_train, verbose=0)

[0.021850384771823883, 0.9992803931236267, 1.0]

model.evaluate(X_val_bow, y_val, verbose=0)

[3.338824987411499, 0.9762589931488037, 0.9971222877502441]

Permutation importance algorithm

Taken directly from scikit-learn documentation:

Inputs: fitted predictive model m, tabular dataset (training or validation) D.
Compute the reference score s of the model m on data D (for instance the accuracy for a classifier or the R^2 for a regressor).
For each feature j (column of D):
- For each repetition k in {1, \dots, K}:
  - Randomly shuffle column j of dataset D to generate a corrupted version of the data named \tilde{D}_{k,j}.
  - Compute the score s_{k,j} of model m on corrupted data \tilde{D}_{k,j}.
- Compute importance i_j for feature f_j defined as:
  
  i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j}

Code

def permutation_test(model, X, y, num_reps=1, seed=42):
    """
    Run the permutation test for variable importance.
    Returns matrix of shape (X.shape[1], len(model.evaluate(X, y))).
    """
    rnd.seed(seed)
    scores = []    

    for j in range(X.shape[1]):
        original_column = np.copy(X[:, j])
        col_scores = []

        for r in range(num_reps):
            rnd.shuffle(X[:,j])
            col_scores.append(model.evaluate(X, y, verbose=0))

        scores.append(np.mean(col_scores, axis=0))
        X[:,j] = original_column
    
    return np.array(scores)

Run the permutation test

all_perm_scores = permutation_test(model, X_val_bow, y_val)

Code

perm_scores = all_perm_scores[:,1]
plt.plot(perm_scores)
plt.xlabel("Input index")
plt.ylabel("Accuracy when shuffled");

Find the most significant inputs

Code

vocab = vect.get_feature_names_out()
input_cols = list(vocab) + weather_cols

best_input_inds = np.argsort(perm_scores)[:100]
best_inputs = [input_cols[idx] for idx in best_input_inds]

print(best_inputs)

['v3', 'v2', 'vehicle', 'lane', 'harmful', 'right', 'divided', 'south', 'motor', 'dry', 'event', 'left', 'parked', 'WEATHER4', 'related', 'stop', 'impact', 'v4', 'crash', 'direction', 'involved', 'internal', 'stated', 'barrier', 'dodge', 'asphalt', 'chevrolet', 'higher', 'forward', 'pre', 'precrash', 'pushed', 'corner', 'hand', 'prior', 'door', 'WEATHER8', 'factor', 'work', 'mph', 'year', 'critical', 'WEATHER1', 'WEATHER3', 'single', 'WEATHER5', 'straight', 'ahead', 'turning', 'honda', 'hours', 'type', 'daylight', 'possible', 'ford', 'male', 'facing', 'actions', 'consists', 'unknown', 'uphill', 'pick', 'stopped', 'point', 'alcohol', 'high', 'pull', 'proceeded', 'encroaching', 'morning', 'trailer', 'grand', 'associated', 'blood', 'meters', 'basis', 'experience', 'prescription', 'moved', 'small', 'steered', 'maneuver', 'medication', 'heart', 'rotate', 'old', 'pain', 'weekday', 'clear', 'seconds', 'civic', 'started', 'northbound', '2006', 'noon', 'miles', '44', 'injuries', 'vehicles', 'saw']

How about a simple decision tree?

from sklearn import tree

clf = tree.DecisionTreeClassifier(random_state=0, max_leaf_nodes=3)
clf.fit(X_train_bow[:, best_input_inds], y_train);

print(clf.score(X_train_bow[:, best_input_inds], y_train))
print(clf.score(X_val_bow[:, best_input_inds], y_val))

0.9275605660829935
0.939568345323741

The decision tree ends up giving pretty good results.

Decision tree

tree.plot_tree(clf, feature_names=best_inputs, filled=True);

print(np.where(clf.feature_importances_ > 0)[0])
[best_inputs[ind] for ind in np.where(clf.feature_importances_ > 0)[0]]

[ 0 41]

['v3', 'critical']

This is why we replace “v1”, “v2”, “v3”

Code

# Go through every summary and find the words "V1", "V2" and "V3".
# For each summary, replace "V1" with a random number like "V1623", and "V2" with a different random number like "V1234".
rnd.seed(123)

df = df_raw.copy()
for i, summary in enumerate(df["SUMMARY_EN"]):
    word_numbers = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
    num_cars = 10
    new_car_nums = [f"V{rnd.randint(100, 10000)}" for _ in range(num_cars)]
    num_spaces = 4

    for car in range(1, num_cars+1):
        new_num = new_car_nums[car-1]
        summary = summary.replace(f"V-{car}", new_num)
        summary = summary.replace(f"Vehicle {word_numbers[car-1]}", new_num).replace(f"vehicle {word_numbers[car-1]}", new_num)
        summary = summary.replace(f"Vehicle #{word_numbers[car-1]}", new_num).replace(f"vehicle #{word_numbers[car-1]}", new_num)
        summary = summary.replace(f"Vehicle {car}", new_num).replace(f"vehicle {car}", new_num)
        summary = summary.replace(f"Vehicle #{car}", new_num).replace(f"vehicle #{car}", new_num)
        summary = summary.replace(f"Vehicle # {car}", new_num).replace(f"vehicle # {car}", new_num)

        for j in range(num_spaces+1):
            summary = summary.replace(f"V{' '*j}{car}", new_num).replace(f"V{' '*j}#{car}", new_num).replace(f"V{' '*j}# {car}", new_num)
            summary = summary.replace(f"v{' '*j}{car}", new_num).replace(f"v{' '*j}#{car}", new_num).replace(f"v{' '*j}# {car}", new_num)
         
    df.loc[i, "SUMMARY_EN"] = summary

There was a slide in the NLP deck titled “Just ignore this for now…” That was going through each summary and replacing the words “V1”, “V2”, “V3” with random numbers. This was done to see if the model was overfitting to these words.

Code

features = df[["SUMMARY_EN"] + weather_cols]
X_main, X_test, y_main, y_test = train_test_split(features, target, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_main, y_main, test_size=0.25, random_state=1)

vect = CountVectorizer(max_features=1_000, stop_words="english")
vect.fit(X_train["SUMMARY_EN"])

X_train_bow = vectorise_dataset(X_train, vect)
X_val_bow = vectorise_dataset(X_val, vect)
X_test_bow = vectorise_dataset(X_test, vect)

model = build_model(num_features, num_cats)

es = EarlyStopping(patience=1, restore_best_weights=True,
    monitor="val_accuracy", verbose=2)

model.fit(X_train_bow, y_train, epochs=10,
    callbacks=[es], validation_data=(X_val_bow, y_val), verbose=0);

Retraining on the fixed dataset gives us a more realistic (lower) accuracy.

model.evaluate(X_train_bow, y_train, verbose=0)

[0.1021684780716896, 0.9815303683280945, 0.9990405440330505]

model.evaluate(X_val_bow, y_val, verbose=0)

[2.4335880279541016, 0.9381294846534729, 0.9942445755004883]

Permutation importance accuracy plot

Code

perm_scores = permutation_test(model, X_val_bow, y_val)[:,1]
plt.plot(perm_scores)
plt.xlabel("Input index"); plt.ylabel("Accuracy when shuffled");

Find the most significant inputs

vocab = vect.get_feature_names_out()
input_cols = list(vocab) + weather_cols

best_input_inds = np.argsort(perm_scores)[:100]
best_inputs = [input_cols[idx] for idx in best_input_inds]

print(best_inputs)

['involved', 'harmful', 'event', 'struck', 'motor', 'higher', 'direction', 'line', 'ford', 'edge', 'single', 'lane', 'contacted', 'left', 'old', 'parked', 'vehicle', 'turned', 'guardrail', 'injured', 'coded', 'legally', 'encroachment', 'critical', 'associated', 'driving', 'intersection', 'location', 'traffic', 'limit', 'rear', 'pushed', 'police', 'stopped', 'include', 'approaching', 'WEATHER5', 'brakes', 'clear', 'WEATHER4', 'way', 'continued', 'counterclockwise', 'turning', 'hit', 'departed', 'driven', 'afternoon', 'northwest', 'stop', 'steered', 'female', 'saw', 'passenger', 'pickup', 'truck', 'actions', 'WEATHER8', '1993', 'properly', 'sky', 'roof', 'sleep', 'road', 'causing', 'approached', 'applied', 'mercury', 'tree', '23', 'possible', 'large', 'check', 'wearing', 'advisory', 'later', 'door', 'divided', 'distraction', '2006', 'treated', 'travels', 'green', 'gmc', 'past', 'path', 'initial', 'final', 'school', 'towed', 'tire', 'time', 'crossing', 'unsuccessful', 'outside', 'route', 'unable', 'adjacent', 'moved', 'southbound']

Inherently Interpretable Models

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

What is inherent interpretability?

“Interpretability by design is decided on the level of the machine learning algorithm. If you want a machine learning algorithm that produces interpretable models, the algorithm has to constrain the search of models to those that are interpretable. The simplest example is linear regression: When you use ordinary least squares to fit/train a linear regression model, you are using an algorithm that will produce … models that are linear in the input features. Models that are interpretable by design are also called intrinsically or inherently interpretable models.” (Molnar 2020)

Examples of interpretable models

Linear regression
Generalised linear models
Decision trees
Decision rules

graph TD
  A{vpdmax8 < 27}
  A -->|true| B{pcpn8 ≥ 37}
  A -->|false| C{vpdmax6 < 29}

  B -->|true| D{tmax9 ≥ 21}
  B -->|false| E{tmax6 ≥ 26}

  D -->|true| F[11]
  D -->|false| G[118]

  E -->|true| H[60]
  E -->|false| I[133]

  C -->|true| J{tmax7 ≥ 31}
  C -->|false| M{tmax7 < 31}

  J -->|true| K[81]
  J -->|false| L[131]

  M -->|true| N[82]
  M -->|false| O[186]

E.g. decision tree for payouts of index insurance (Chen et al. 2024).

Better trees

“The optimization over the node parameters (exact for axis-aligned trees, approximate for oblique trees) assumes the rest of the tree (structure and parameters) is fixed. The greedy nature of the algorithm means that once a node is optimized, it its fixed forever.” (Carreira-Perpinán and Tavallali 2018)

Non-greedy search can improve the accuracy of the tree, without sacrificing interpretability.

Some processes don’t fit the tree structure

Decision rules

Make predictions using if-then statements related to the inputs.

“Decision rules can be as expressive as decision trees, while being more compact. Decision trees often also suffer from replicated sub-trees, that is, when the splits in a left and a right child node have the same structure.” (Molnar 2020)

def rail_cost(peak_hours, distance):
  if peak_hours:
      if distance <= 10:
        cost = 3.79
      elif distance <= 20:
        cost = 4.71
      elif distance <= 35:
        cost = 5.42
      elif distance <= 65:
        cost = 7.24
      else:
        cost = 9.31
  else:
      if distance <= 10:
        cost = 2.65
      elif distance <= 20:
        cost = 3.29
      elif distance <= 35:
        cost = 3.79
      elif distance <= 65:
        cost = 5.06
      else:
        cost = 6.51
  return cost

Scoring rules

Example from Rudin (2019).

When to choose inherent interpretability?

Article 22 GDPR – Automated individual decision-making, including profiling

The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.
Paragraph 1 shall not apply if the decision:
1. is necessary for entering into, or performance of, a contract between the data subject and a data controller;
2. is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests; or
3. is based on the data subject’s explicit consent.
In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.
Decisions referred to in paragraph 2 shall not be based on special categories of personal data referred to in Article 9(1), unless point (a) or (g) of Article 9(2) applies and suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests are in place.

Linear models & LocalGLMNet

A GLM has the form

\hat{y} = g^{-1}\bigl( \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p \bigr)

where \beta_0, \dots, \beta_p are the model parameters.

Global & local interpretations are easy to obtain.

LocalGLMNet extends this to a neural network (Richman and Wüthrich 2023).

\hat{y_i} = g^{-1}\bigl( \beta_0(\boldsymbol{x}_i) + \beta_1(\boldsymbol{x}_i) x_{i1} + \dots + \beta_p(\boldsymbol{x}_i) x_{ip} \bigr)

A GLM with local parameters \beta_0(\boldsymbol{x}_i), \dots, \beta_p(\boldsymbol{x}_i) for each observation \boldsymbol{x}_i. The local parameters are the output of a neural network.

Neural Additive Models (NAMs)

Each covariate (or select interactions) receive their own subnetwork, which contribute additively.

Post-hoc Explanations

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

One variable’s effect on the prediction

Various plots showing the effect of one variable on the prediction; from (Molnar 2020, sec. 12)

The different plots

A Ceteris Paribus plot shows how the model’s prediction changes as we vary the value of one covariate/input while keeping all other features constant at their original values for some observation.

An Individual Conditional Expectation (ICE) plot is the same thing as a ceteris paribus plot, but for each observation in the dataset.

A Partial Dependence Plot (PDP) shows how the model’s prediction changes as we vary the value of one covariate/input while averaging over all other features in the dataset.

Useful for Hyperparameter Tuning

Partial dependence plots for hyperparameter tuning

Permutation importance

Inputs: fitted model m, tabular dataset D.
Compute the reference score s of the model m on data D (for instance the accuracy for a classifier or the R^2 for a regressor).
For each feature j (column of D):
- For each repetition k in {1, \dots, K}:
  - Randomly shuffle column j of dataset D to generate a corrupted version of the data named \tilde{D}_{k,j}.
  - Compute the score s_{k,j} of model m on corrupted data \tilde{D}_{k,j}.
- Compute importance i_j for feature f_j defined as:
  
  i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j}

Originally proposed by Breiman (2001), extended by Fisher, Rudin, and Dominici (2019).

Permutation importance

def permutation_test(model, X, y, num_reps=1, seed=42):
    """
    Run the permutation test for variable importance.
    Returns matrix of shape (X.shape[1], len(model.evaluate(X, y))).
    """
    rnd.seed(seed)
    scores = []    

    for j in range(X.shape[1]):
        original_column = np.copy(X[:, j])
        col_scores = []

        for r in range(num_reps):
            rnd.shuffle(X[:,j])
            col_scores.append(model.evaluate(X, y, verbose=0))

        scores.append(np.mean(col_scores, axis=0))
        X[:,j] = original_column
    
    return np.array(scores)

LIME

Local Interpretable Model-agnostic Explanations employs an interpretable surrogate model to explain locally how the black-box model makes predictions for individual instances.

E.g. a black-box model predicts Bob’s premium as the highest among all policyholders. LIME uses an interpretable model (a linear regression) to explain how Bob’s features influence the black-box model’s prediction.

LIME Algorithm

Suppose we want to explain the instance \boldsymbol{x}_{\text{Bob}}=(1, 2, 0.5).

Generate perturbed examples of \boldsymbol{x}_{\text{Bob}} and use the trained gamma MDN f to make predictions: \begin{align*} \boldsymbol{x}^{'(1)}_{\text{Bob}} &= (1.1, 1.9, 0.6), \quad f\big(\boldsymbol{x}^{'(1)}_{\text{Bob}}\big)=34000 \\ \boldsymbol{x}^{'(2)}_{\text{Bob}} &= (0.8, 2.1, 0.4), \quad f\big(\boldsymbol{x}^{'(2)}_{\text{Bob}}\big)=31000 \\ &\vdots \quad \quad \quad \quad\quad \quad\quad \quad\quad \quad \quad \vdots \end{align*} We can then construct a dataset of N_{\text{Examples}} perturbed examples: \mathcal{D}_{\text{LIME}} = \big(\big\{\boldsymbol{x}^{'(i)}_{\text{Bob}},f\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\big\}\big)_{i=0}^{N_{\text{Examples}}}.

LIME Algorithm

Fit an interpretable model g, i.e., a linear regression using \mathcal{D}_{\text{LIME}} and the following loss function: \mathcal{L}_{\text{LIME}}(f,g,\pi_{\boldsymbol{x}_{\text{Bob}}})=\sum_{i=1}^{N_{\text{Examples}}}\pi_{\boldsymbol{x}_{\text{Bob}}}\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\cdot \bigg(f\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)-g\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\bigg)^2, where \pi_{\boldsymbol{x}_{\text{Bob}}}\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big) represents the distance from the perturbed example \boldsymbol{x}^{'(i)}_{\text{Bob}} to the instance to be explained \boldsymbol{x}_{\text{Bob}}.

“Explaining” to Bob

The bold red cross is the instance being explained. LIME samples instances (grey nodes), gets predictions using f (gamma MDN) and weighs them by the proximity to the instance being explained (represented here by size). The dashed line g is the learned local explanation.

“Again the approximation must be imperfect, otherwise one would throw out the black box and instead use the explanation as an inherently interpretable model.” (Rudin et al. 2022)

SHAP Values

The SHapley Additive exPlanations (SHAP) value helps to quantify the contribution of each feature to the prediction for a specific instance (Lundberg and Lee 2017).

The SHAP value for the jth feature is defined as \begin{align*} \text{SHAP}^{(j)}(\boldsymbol{x}) &= \sum_{U\subset \{1, ..., p\} \backslash \{j\}} \frac{1}{p} \binom{p-1}{|U|}^{-1} \big(\mathbb{E}[Y| \boldsymbol{x}^{(U\cup \{j\})}] - \mathbb{E}[Y|\boldsymbol{x}^{(U)}]\big), \end{align*} where p is the number of features. A positive SHAP value indicates that the variable increases the prediction value.

SHAP waterfall plot

Belgian Motor Dataset

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

beMTPL97 dataset

data = pd.read_csv('data/raw/beMTPL97.csv')
claims = data.drop(columns = ["id", "claim", "amount", "average"])
claims.shape

(163212, 14)

postcode = claims.pop("postcode")
train_raw, test_raw = train_test_split(claims, test_size=0.2, random_state=2000, stratify=postcode)

X_train_raw = train_raw.drop(columns='nclaims')
X_test_raw = test_raw.drop(columns='nclaims')
y_train_raw = train_raw['nclaims']
y_test_raw = test_raw['nclaims']

num_vars = ['expo', 'ageph', 'bm', 'power', 'agec', 'lat', 'long']
cat_vars = ['coverage', 'sex', 'fuel', 'use', 'fleet']

Covariates

Numerical variables:

Variable	Description
`expo`	exposure to risk.
`ageph`	policyholder’s age.
`bm`	An integer for the level on the former Belgian bonus-malus scale (0 to 22; higher = worse claims history).
`power`	vehicle’s horsepower in kilowatts.
`agec`	vehicle’s age in years.
`long`	longitude of the policyholder’s municipality center.
`lat`	latitude of the policyholder’s municipality center.

Target is nclaims.

Categorical variables:

Variable	Description
`coverage`	insurance coverage level: `"TPL"` (third party liability), `"TPL+"` (TPL + limited material damage), `"TPL++"` (TPL + comprehensive material damage).
`sex`	policyholder’s gender: `"female"`, `"male"`.
`bm`	level on the former Belgian bonus-malus scale (0 to 22; higher = worse claims history).
`fuel`	vehicle’s fuel type: `"gasoline"` or `"diesel"`.
`use`	vehicle’s use: `"private"` or `"work"`.
`fleet`	vehicle is part of a fleet (`1` = yes, `0` = no).

Exposure & frequency

claims["expo"].plot(kind='hist', title='Exposure Distribution')

claims["nclaims"].value_counts()

nclaims
0    144936
1     16539
2      1556
3       162
4        17
5         2
Name: count, dtype: int64

Map of claims

Code

import geopandas as gpd
import contextily as ctx
from shapely.geometry import Point

# Compute average claims per location
avg = train_raw.groupby(['lat', 'long'], as_index=False)['nclaims'].mean()

# Build GeoDataFrame in Web Mercator
gdf = gpd.GeoDataFrame(
    avg,
    geometry=[Point(xy) for xy in zip(avg.long, avg.lat)],
    crs="EPSG:4326"
).to_crs(epsg=3857)

# Plot
ax = gdf.plot(
    column='nclaims',
    cmap='OrRd',
    markersize=8,
    edgecolor='grey',
    linewidth=0.5,
    alpha=0.8,
    legend=True,
    figsize=(10, 5)
)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.set_axis_off()
plt.title("Average Claims per Location")
plt.show()

Fitting using statsmodels

import statsmodels.api as sm
import statsmodels.formula.api as smf

formula = "nclaims ~ expo + ageph + bm + power + agec + lat + long " + \
    " + C(coverage) + C(sex) + C(fuel) + C(use) + C(fleet)"

glm_model = smf.glm(
    formula=formula,
    data=train_raw,
    family=sm.families.Poisson()
).fit()

Question

What do you expect to be the relationship between ageph and nclaims?

X_train_first = X_train_raw.iloc[0:1].copy()
X_train_first

	expo	coverage	ageph	sex	bm	power	agec	fuel	use	fleet	long	lat
25776	1.0	TPL	59	female	0	51	15	gasoline	private	0	4.387146	51.216042

Summary

glm_model.summary().tables[0]

Generalized Linear Model Regression Results
Dep. Variable:	nclaims	No. Observations:	130569
Model:	GLM	Df Residuals:	130555
Model Family:	Poisson	Df Model:	13
Link Function:	Log	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-49908.
Date:	Wed, 30 Jul 2025	Deviance:	69690.
Time:	23:00:35	Pearson chi2:	1.39e+05
No. Iterations:	6	Pseudo R-squ. (CS):	0.01842
Covariance Type:	nonrobust

Summary

glm_model.summary().tables[1]

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	-7.0204	1.344	-5.222	0.000	-9.655	-4.386
C(coverage)[T.TPL+]	-0.0765	0.020	-3.889	0.000	-0.115	-0.038
C(coverage)[T.TPL++]	-0.0715	0.027	-2.660	0.008	-0.124	-0.019
C(sex)[T.male]	-0.0259	0.018	-1.429	0.153	-0.061	0.010
C(fuel)[T.gasoline]	-0.1793	0.017	-10.459	0.000	-0.213	-0.146
C(use)[T.work]	-0.0864	0.037	-2.306	0.021	-0.160	-0.013
C(fleet)[T.1]	-0.0886	0.048	-1.832	0.067	-0.183	0.006
expo	0.9893	0.041	24.196	0.000	0.909	1.069
ageph	-0.0065	0.001	-10.724	0.000	-0.008	-0.005
bm	0.0642	0.002	33.035	0.000	0.060	0.068
power	0.0036	0.000	8.309	0.000	0.003	0.004
agec	-0.0019	0.002	-0.872	0.383	-0.006	0.002
lat	0.0777	0.026	2.969	0.003	0.026	0.129
long	0.0279	0.011	2.542	0.011	0.006	0.049

Interpreting Inherently Interpretable Models

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

GLM Ceteris Paribus Plots

ageph_range = np.linspace(train_raw['ageph'].min(), train_raw['ageph'].max(), 20)
y_pred_batch = []
for ageph in ageph_range:
    X_train_first['ageph'] = ageph
    y_pred_batch.append(glm_model.predict(X_train_first))

Code

plt.plot(ageph_range, y_pred_batch, 'r')

real_age = X_train_raw['ageph'].iloc[0:1].item()
orig_prediction = glm_model.predict(X_train_raw.iloc[0:1]).item()
plt.plot(real_age, orig_prediction, 'o', color='r', markersize=5)

plt.xlabel('Age of Policyholder')
plt.ylabel('Predicted Number of Claims')
plt.title('GLM Predictions for Varying Age of Policyholder #1');

Zoomed out

We can see the trend if we zoom out to ages between 0 & 1,000.

Code

ageph_range = np.linspace(0, 1000, 20)

X_train_first = X_train_raw.iloc[0:1].copy()
y_pred_batch = []
for ageph in ageph_range:
    X_train_first['ageph'] = ageph
    y_pred_ageph = glm_model.predict(X_train_first)
    y_pred_batch.append(y_pred_ageph)

plt.plot(ageph_range, y_pred_batch, 'r')
plt.plot(real_age, orig_prediction, 'o', markersize=5, color='r')

plt.xlabel('Age of Policyholder')
plt.ylabel('Predicted Number of Claims')
plt.title('GLM Predictions for Varying Age of Policyholder #1');

What if we look at multiple policyholders?

Code

ageph_range = np.linspace(train_raw['ageph'].min(), train_raw['ageph'].max(), 20)

# Just overlay the plots for the first 10 training observations
for i in range(5):
    X_train_first = X_train_raw.iloc[i:i+1].copy()
    y_pred_batch = []

    real_age = X_train_first['ageph'].item()
    orig_prediction = glm_model.predict(X_train_first).item()

    for ageph in ageph_range:
        X_train_first['ageph'] = ageph
        y_pred_ageph = glm_model.predict(X_train_first)
        y_pred_batch.append(y_pred_ageph)

    res = plt.plot(ageph_range, y_pred_batch, label=f'Policyholder {i+1}')
    plt.plot(real_age, orig_prediction, 'o', markersize=5, color=res[0].get_color())

plt.xlabel('Age of Policyholder')
plt.ylabel('Predicted Number of Claims')
plt.title('GLM Predictions for Varying Age of Policyholders')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1), ncol=1);

Logarithmic scale

Code

ageph_range = np.linspace(train_raw['ageph'].min(), train_raw['ageph'].max(), 100)

# Just overlay the plots for the first 10 training observations
for i in range(5):
    X_train_first = X_train_raw.iloc[i:i+1].copy()
    log_y_pred_batch = []

    real_age = X_train_first['ageph'].item()
    orig_prediction = np.log(glm_model.predict(X_train_first).item())


    for ageph in ageph_range:
        X_train_first['ageph'] = ageph
        y_pred_ageph = glm_model.predict(X_train_first)
        log_y_pred_batch.append(np.log(y_pred_ageph))

    res = plt.plot(ageph_range, log_y_pred_batch, label=f'Policyholder {i+1}')
    plt.plot(real_age, orig_prediction, 'o', markersize=5, color=res[0].get_color())

plt.xlabel('Age of Policyholder')
plt.ylabel('Log-Predicted Num. Claims')
plt.title('GLM Log-Predictions for Varying Age of Policyholders')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1), ncol=1);

Do it for `ageph` and `agec`

Code

import plotly.graph_objects as go

# Create ranges
ageph_range = np.linspace(train_raw['ageph'].min(), train_raw['ageph'].max(), 100)
agec_range = np.linspace(train_raw['agec'].min(), train_raw['agec'].max(), 100)

# Create grid
ageph_grid, agec_grid = np.meshgrid(ageph_range, agec_range)
ageph_flat = ageph_grid.ravel()
agec_flat = agec_grid.ravel()

# Get the first training observation as a dictionary
base_row = X_train_raw.iloc[0].to_dict()

# Create a DataFrame with repeated base_row and overwrite ageph/agec
df_batch = pd.DataFrame([base_row] * len(ageph_flat))
df_batch['ageph'] = ageph_flat
df_batch['agec'] = agec_flat

# Predict all at once
y_pred_flat = glm_model.predict(df_batch)

# Reshape predictions into 2D grid
Z = y_pred_flat.values.reshape(agec_grid.shape)  # shape = (100, 100)

# Plot
fig = go.Figure(data=[
    go.Surface(x=agec_grid, y=ageph_grid, z=Z, colorscale='Viridis')
])

fig.update_layout(
    title='GLM-predicted Number of Claims',
    scene=dict(
        xaxis_title='agec',
        yaxis_title='ageph',
        zaxis_title='Predicted Response'
    ),
    width=800,          # Wider figure
    height=500,         # Taller figure
    margin=dict(l=0, r=0, t=50, b=50),  # Reduce whitespace cropping
)

Generalised Additive Models (GAMs)

Transform a GLM’s inputs nonlinearly, i.e.,

\hat{y}_i = g^{-1}\bigl( \beta_0 + f_1(x_{i,1}) + f_2(x_{i,2}) + ... + f_p(x_{i,p}) \bigr)

The f_j are typically polynomials, step functions, or splines.

ct = make_column_transformer(
    ("passthrough", num_vars),
    (OrdinalEncoder(), cat_vars),
    verbose_feature_names_out = False
)
X_train_gam = ct.fit_transform(X_train_raw)
X_test_gam = ct.transform(X_test_raw)
y_train_gam = y_train_raw
y_test_gam = y_test_raw

Code

# Need to monkey patch to get pygam working currently
import scipy.sparse

def to_array(self):
    return self.toarray()

scipy.sparse.spmatrix.A = property(to_array)

from pygam import PoissonGAM, s, f
formula = s(0) + s(1) + s(2) + s(3) + s(4) + s(5) + s(6) + \
    f(7) + f(8) + f(9) + f(10) + f(11)
gam_model = PoissonGAM(formula).fit(X_train_gam, y_train_gam)

Ceteris paribus plot for `ageph`

Code

# Define range of ageph values
ageph_range = np.linspace(X_train_gam['ageph'].min(), X_train_gam['ageph'].max(), 100)

# Start with a copy of the first encoded training row
X_first = X_train_gam.iloc[0:1].copy()
# Set the dtype of 'ageph' to float
X_first.loc[:, 'ageph'] = X_first.loc[:, 'ageph'].astype(float)
gam_preds = []

real_age = X_first['ageph'].item()
orig_prediction = gam_model.predict(X_first).item()

for ageph in ageph_range:
    X_first.loc[:, 'ageph'] = ageph
    y_pred = gam_model.predict(X_first)[0]
    gam_preds.append(y_pred)

plt.plot(ageph_range, gam_preds, label='GAM', color='red')

# Add original GLM prediction for reference
plt.plot(real_age, orig_prediction, 'o', color='red', markersize=5)

plt.xlabel('Age of Policyholder')
plt.ylabel('Predicted Number of Claims')
plt.title('GAM Predictions for Varying Age')
plt.legend();

GAM bivariate effects for `ageph` and `agec`

Code

import plotly.graph_objects as go

# Create ranges
ageph_range = np.linspace(X_train_gam['ageph'].min(), X_train_gam['ageph'].max(), 100)
agec_range = np.linspace(X_train_gam['agec'].min(), X_train_gam['agec'].max(), 100)

# Create grid
ageph_grid, agec_grid = np.meshgrid(ageph_range, agec_range)
ageph_flat = ageph_grid.ravel()
agec_flat = agec_grid.ravel()

# Get the first training observation as a dictionary
base_row = X_train_gam.iloc[0].to_dict()

# Create a DataFrame with repeated base_row and overwrite ageph/agec
df_batch = pd.DataFrame([base_row] * len(ageph_flat))
df_batch['ageph'] = ageph_flat
df_batch['agec'] = agec_flat

# Predict all at once
y_pred_flat = gam_model.predict(df_batch)

# Reshape predictions into 2D grid
Z = y_pred_flat.reshape(agec_grid.shape)  # shape = (100, 100)

# Plot
fig = go.Figure(data=[
    go.Surface(x=agec_grid, y=ageph_grid, z=Z, colorscale='Viridis')
])

fig.update_layout(
    title='GLM-predicted Number of Claims',
    scene=dict(
        xaxis_title='agec',
        yaxis_title='ageph',
        zaxis_title='Predicted Response'
    ),
    width=800,          # Wider figure
    height=500,         # Taller figure
    margin=dict(l=0, r=0, t=50, b=50),  # Reduce whitespace cropping
)

Explaining a Neural Network with Partial Dependence Plots

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

Build a neural network

ct = make_column_transformer(
    (StandardScaler(), num_vars),
    (OneHotEncoder(drop="first", sparse_output=False), cat_vars),
    verbose_feature_names_out=False
)
ct.fit(X_train_raw)

X_train_nn = ct.transform(X_train_raw)
X_test_nn = ct.transform(X_test_raw)
y_train_nn = y_train_raw.values
y_test_nn = y_test_raw.values


random.seed(123)

nn_model = Sequential(
    [
        Input(shape=(X_train_nn.shape[1],)),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(1, activation="exponential")
    ]
)

nn_model.compile(
    optimizer="adam",
    loss="poisson",
    metrics=["mae"]
)

history = nn_model.fit(X_train_nn, y_train_nn,
    epochs=25, batch_size=64, verbose=0,
)

Code

plt.plot(history['loss'])

Metrics

from sklearn.metrics import mean_poisson_deviance, mean_absolute_error

# Evaluate the neural network model
y_pred_nn = nn_model.predict(X_test_nn, verbose=0, batch_size=X_test_nn.shape[0]).flatten()
print(f"NN mean Poisson deviance  : {mean_poisson_deviance(y_test_nn, y_pred_nn):.4f}")
print(f"NN MAE                    : {mean_absolute_error(y_test_nn, y_pred_nn):.4f}")

# Compare to the GAM
y_pred_gam = gam_model.predict(X_test_gam).ravel()
print(f"GAM mean Poisson deviance : {mean_poisson_deviance(y_test_gam, y_pred_gam):.4f}")
print(f"GAM MAE                   : {mean_absolute_error(y_test_gam, y_pred_gam):.4f}")

## Compare to the GLM
y_pred_glm = glm_model.predict(X_test_raw).values.ravel()
print(f"GLM mean Poisson deviance : {mean_poisson_deviance(y_test_raw, y_pred_glm):.4f}")
print(f"GLM MAE                   : {mean_absolute_error(y_test_raw, y_pred_glm):.4f}")

NN mean Poisson deviance  : 0.5483
NN MAE                    : 0.2253
GAM mean Poisson deviance : 0.5254
GAM MAE                   : 0.2149
GLM mean Poisson deviance : 0.5326
GLM MAE                   : 0.2164

Permutation importance example

scores = permutation_test(nn_model, X_train_nn.values, y_train_nn)
plt.plot(scores[:,0], label='Loss')
plt.xticks(ticks=np.arange(len(X_train_nn.columns)), labels=X_train_nn.columns, rotation=90);

PDP: Training data

Partial dependence plots start by looking at the training data.

X_train_raw

	expo	coverage	ageph	sex	bm	power	agec	fuel	use	fleet	long	lat
25776	1.000000	TPL	59	female	0	51	15	gasoline	private	0	4.387146	51.216042
63185	0.819178	TPL++	40	female	0	96	3	gasoline	private	0	5.500567	50.583188
130175	1.000000	TPL	31	female	8	40	8	gasoline	private	0	3.721116	50.535314
...	...	...	...	...	...	...	...	...	...	...	...	...
24617	1.000000	TPL	75	male	0	29	17	gasoline	private	0	4.387146	51.216042
61581	1.000000	TPL	63	male	0	55	11	gasoline	private	0	5.612566	50.680020
10699	1.000000	TPL+	50	male	6	74	3	gasoline	private	0	4.678745	50.687562

130569 rows × 12 columns

nn_model.predict(ct.transform(X_train_raw), verbose=0).mean()

0.13736986

PDP: Alternate Reality

X_train_pd = X_train_raw.copy()
X_train_pd['ageph'] = 18
X_train_pd

	expo	coverage	ageph	sex	bm	power	agec	fuel	use	fleet	long	lat
25776	1.000000	TPL	18	female	0	51	15	gasoline	private	0	4.387146	51.216042
63185	0.819178	TPL++	18	female	0	96	3	gasoline	private	0	5.500567	50.583188
130175	1.000000	TPL	18	female	8	40	8	gasoline	private	0	3.721116	50.535314
...	...	...	...	...	...	...	...	...	...	...	...	...
24617	1.000000	TPL	18	male	0	29	17	gasoline	private	0	4.387146	51.216042
61581	1.000000	TPL	18	male	0	55	11	gasoline	private	0	5.612566	50.680020
10699	1.000000	TPL+	18	male	6	74	3	gasoline	private	0	4.678745	50.687562

130569 rows × 12 columns

nn_model.predict(ct.transform(X_train_pd), verbose=0).mean()

0.1750744

PDP: Alternate Reality

X_train_pd = X_train_raw.copy()
X_train_pd['ageph'] = 19
X_train_pd

	expo	coverage	ageph	sex	bm	power	agec	fuel	use	fleet	long	lat
25776	1.000000	TPL	19	female	0	51	15	gasoline	private	0	4.387146	51.216042
63185	0.819178	TPL++	19	female	0	96	3	gasoline	private	0	5.500567	50.583188
130175	1.000000	TPL	19	female	8	40	8	gasoline	private	0	3.721116	50.535314
...	...	...	...	...	...	...	...	...	...	...	...	...
24617	1.000000	TPL	19	male	0	29	17	gasoline	private	0	4.387146	51.216042
61581	1.000000	TPL	19	male	0	55	11	gasoline	private	0	5.612566	50.680020
10699	1.000000	TPL+	19	male	6	74	3	gasoline	private	0	4.678745	50.687562

130569 rows × 12 columns

nn_model.predict(ct.transform(X_train_pd), verbose=0).mean()

0.17299955

PDP: Alternate Reality

X_train_pd = X_train_raw.copy()
X_train_pd['ageph'] = 90
X_train_pd

	expo	coverage	ageph	sex	bm	power	agec	fuel	use	fleet	long	lat
25776	1.000000	TPL	90	female	0	51	15	gasoline	private	0	4.387146	51.216042
63185	0.819178	TPL++	90	female	0	96	3	gasoline	private	0	5.500567	50.583188
130175	1.000000	TPL	90	female	8	40	8	gasoline	private	0	3.721116	50.535314
...	...	...	...	...	...	...	...	...	...	...	...	...
24617	1.000000	TPL	90	male	0	29	17	gasoline	private	0	4.387146	51.216042
61581	1.000000	TPL	90	male	0	55	11	gasoline	private	0	5.612566	50.680020
10699	1.000000	TPL+	90	male	6	74	3	gasoline	private	0	4.678745	50.687562

130569 rows × 12 columns

nn_model.predict(ct.transform(X_train_pd), verbose=0).mean()

0.10095821

Question

What could go wrong?

Partial dependence plots

# Make partial dependence plot for ageph by hand
ageph_range = np.linspace(X_train_raw['ageph'].min(), X_train_raw['ageph'].max(), 100)
y_preds = []
X_train_pd = X_train_raw.copy()

for age in ageph_range:
    X_train_pd['ageph'] = age
    X_train_pd_ct = ct.transform(X_train_pd)
    y_pred_batch = nn_model.predict(X_train_pd_ct, verbose=0, batch_size=X_train_pd_ct.shape[0])
    y_preds.append(y_pred_batch.mean())

Code

plt.plot(ageph_range, y_preds, label='PDP for ageph')
plt.xlabel('Age of Policyholder')
plt.ylabel('Predicted Number of Claims')
plt.title('Partial Dependence Plot for Age of Policyholder');

Using scikit-learn’s PDP

(First need to trick scikit-learn a little..)

Code

class SklearnModel:
    _estimator_type = "regressor"
    
    def __init__(self, ct, model, batch_size=1000):
        # Add the trailing underscores so sklearn knows we're 'fitted'
        self.ct_ = ct
        self.model_ = model
        self.batch_size = batch_size

    def predict(self, X):
        X_trans = self.ct_.transform(X)
        preds = self.model_.predict(
            X_trans,
            verbose=0,
            batch_size=self.batch_size
        )
        return preds

    def fit(self, X=None, y=None): ...

nn_model_skl = SklearnModel(ct, nn_model)

from sklearn.inspection import PartialDependenceDisplay

PartialDependenceDisplay.from_estimator(
    nn_model_skl,
    X_train_raw,
    features=[0],
    feature_names=X_train_raw.columns,
)

Explaining the ages and exposure

Code

cols = X_train_raw.columns.to_list() 
inds = [cols.index(col) for col in ["ageph", "agec", "expo"]]
PartialDependenceDisplay.from_estimator(
    nn_model_skl,
    X_train_raw,
    features=inds,
    feature_names=X_train_raw.columns,
)

Explaining the bonus-malus and location

Code

cols = X_train_raw.columns.to_list() 
inds = [cols.index(col) for col in ["bm", "lat", "long"]]
PartialDependenceDisplay.from_estimator(
    nn_model_skl,
    X_train_raw,
    features=inds,
    feature_names=X_train_raw.columns,
)

Bivariate age effects

cols = X_train_raw.columns.to_list() 
inds = [cols.index(col) for col in ["ageph", "agec"]]

disp = PartialDependenceDisplay.from_estimator(
    nn_model_skl,
    X_train_raw,
    features=[tuple(inds)],
    feature_names=X_train_raw.columns,
)

Explaining Specific Models

Lecture Outline

Introduction
Illustrative Example
Inherently Interpretable Models
Post-hoc Explanations
Belgian Motor Dataset
Interpreting Inherently Interpretable Models
Explaining a Neural Network with Partial Dependence Plots
Explaining Specific Models

Grad-CAM

See, e.g., Keras tutorial.

Criticism

“Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on ‘explainable ML’, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable, and can be misleading, as we discuss below. If we instead use models that are inherently interpretable, they provide their own explanations, which are faithful to what the model actually computes.” (Rudin 2019)

Criticism II

Multiple conflicting explanations (Rudin 2019)

Conclusion

“Figure 20.2: Explainability techniques allow strengthening the feedback extracted from a model. A, data and domain knowledge allow building the model. B, predictions are obtained from the model. C, by analyzing the predictions, we learn more about the model. D, better understanding of the model allows better understanding of the data and, sometimes, broadens domain knowledge.” Biecek and Burzykowski (2021)

Package Versions

Code

from watermark import watermark
print(watermark(python=True, packages="contextily,geopandas,keras,matplotlib,numpy,pandas,plotly,pygam,seaborn,scipy,shapely,sklearn,statsmodels,torch,tensorflow,tf_keras"))

Python implementation: CPython
Python version       : 3.11.12
IPython version      : 9.3.0

contextily : 1.6.2
geopandas  : 1.1.1
keras      : 3.8.0
matplotlib : 3.10.0
numpy      : 1.26.4
pandas     : 2.2.2
plotly     : 5.24.1
pygam      : 0.10.1
seaborn    : 0.13.2
scipy      : 1.16.0
shapely    : 2.1.1
sklearn    : 1.6.1
statsmodels: 0.14.5
torch      : 2.6.0
tensorflow : 2.18.0
tf_keras   : 2.18.0

Glossary

global interpretability
Grad-CAM
inherent interpretability
LIME
local interpretability
permutation importance
post-hoc interpretability
SHAP values

References

Ben-Zion, Ziv, Kristin Witte, Akshay K Jagadish, Or Duek, Ilan Harpaz-Rotem, Marie-Christine Khorsandian, Achim Burrer, et al. 2025. “Assessing and Alleviating State Anxiety in Large Language Models.” Npj Digital Medicine 8 (1): 132.

Biecek, Przemyslaw, and Tomasz Burzykowski. 2021. Explanatory Model Analysis. Chapman; Hall/CRC, New York. https://pbiecek.github.io/ema/.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.

Carreira-Perpinán, Miguel A, and Pooya Tavallali. 2018. “Alternating Optimization of Decision Trees, with Application to Learning Sparse Oblique Trees.” Advances in Neural Information Processing Systems 31.

Charpentier, Arthur. 2024. Insurance, Biases, Discrimination and Fairness. Springer.

Chen, Zhanhui, Yang Lu, Jinggong Zhang, and Wenjun Zhu. 2024. “Managing Weather Risk with a Neural Network-Based Index Insurance.” Management Science 70 (7): 4306–27.

Delcaillau, Dimitri, Antoine Ly, Alize Papp, and Franck Vermet. 2022. “Model Transparency and Interpretability: Survey and Application to the Insurance Industry.” European Actuarial Journal 12 (2): 443–84.

Doshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.”

Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2019. “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.” Journal of Machine Learning Research 20 (177): 1–81.

Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” arXiv Preprint arXiv:1412.6572.

Li, Cheng, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. 2023. “Large Language Models Understand and Can Be Enhanced by Emotional Stimuli.” arXiv Preprint arXiv:2307.11760.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems 30.

Molnar, Christoph. 2020. Interpretable Machine Learning.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. Association for Computing Machinery.

Richman, Ronald, and Mario V. Wüthrich. 2023. “LocalGLMnet: Interpretable Deep Learning for Tabular Data.” Scandinavian Actuarial Journal 2023 (1): 71–95.

Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.

Rudin, Cynthia, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. 2022. “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges.” Statistic Surveys 16: 1–85.

Interpretability

Introduction

Definitions

A distinction between interpretability and explainability

Aspects of Interpretability

Husky vs. Wolf

Adversarial examples

Adversarial stickers

Adversarial text

LLMs are even more opaque

Illustrative Example

First attempt at NLP task

Bag of words for the top 1,000 words

Trained a basic neural network on that

Permutation importance algorithm

Run the permutation test

Find the most significant inputs

How about a simple decision tree?

Decision tree

This is why we replace “v1”, “v2”, “v3”

Permutation importance accuracy plot

Find the most significant inputs

Inherently Interpretable Models

What is inherent interpretability?

Examples of interpretable models

Better trees

Some processes don’t fit the tree structure

Decision rules

Scoring rules

When to choose inherent interpretability?

Linear models & LocalGLMNet

Neural Additive Models (NAMs)

Post-hoc Explanations

One variable’s effect on the prediction

The different plots

Useful for Hyperparameter Tuning

Permutation importance

Permutation importance

LIME

LIME Algorithm

LIME Algorithm

“Explaining” to Bob

SHAP Values

SHAP waterfall plot

Belgian Motor Dataset

beMTPL97 dataset

Covariates

Exposure & frequency

Map of claims

Fitting using statsmodels

Summary

Summary

Interpreting Inherently Interpretable Models

GLM Ceteris Paribus Plots

Zoomed out

What if we look at multiple policyholders?

Logarithmic scale

Do it for ageph and agec

Generalised Additive Models (GAMs)

Ceteris paribus plot for ageph

GAM bivariate effects for ageph and agec

Explaining a Neural Network with Partial Dependence Plots

Build a neural network

Metrics

Permutation importance example

PDP: Training data

PDP: Alternate Reality

PDP: Alternate Reality

PDP: Alternate Reality

Partial dependence plots

Using scikit-learn’s PDP

Explaining the ages and exposure

Explaining the bonus-malus and location

Bivariate age effects

Explaining Specific Models

Grad-CAM

Criticism

Criticism II

Conclusion

Package Versions

Do it for `ageph` and `agec`

Ceteris paribus plot for `ageph`

GAM bivariate effects for `ageph` and `agec`