Basic Concepts¶

This page introduces the fundamental concepts behind Distributional Refinement Networks (DRN) and distributional regression modeling.

What is Distributional Regression?¶

Traditional regression models predict a single value (the mean) for each input. Distributional regression goes beyond this by modeling the entire conditional distribution of the response variable given the features.

Traditional Regression¶

X → μ (mean only)

Distributional Regression¶

X → P(Y|X) (entire distribution)

This allows us to predict: - Mean: Expected value - Quantiles: Risk measures (e.g., 95th percentile) - Density: Full probability distribution - Uncertainty: Prediction intervals and confidence

The DRN Architecture¶

Core Philosophy¶

DRN addresses three key challenges in actuarial and statistical modeling:

Flexible Covariate Effects: Features should be able to affect different parts of the distribution differently
Interpretability vs. Performance: Maintain model transparency while leveraging ML advances
Distributional Focus: Model the entire distribution, not just the mean

Two-Stage Approach¶

graph LR
    A[Input Features X] --> B[Baseline Model]
    A --> C[Neural Network]
    B --> D[Baseline Distribution]
    C --> E[Refinement Factors]
    D --> F[DRN Distribution]
    E --> F
    F --> G[Predictions]

Stage 1: Baseline Model¶

Usually a Generalized Linear Model (GLM)
Provides interpretable foundation
Captures main relationships in data
Well-understood statistical properties

Deep neural network refines the baseline
Operates on discretized regions (cutpoints)
Constrained by regularization terms
Maintains distributional coherence

Mathematical Framework¶

For a given input x, the DRN produces a distribution where:

Baseline: P₀(y|x) from GLM or other interpretable model
Refinement: Neural network produces adjustment factors
Final Distribution: Refined distribution that respects baseline structure

The refinement is controlled by three key regularization terms:

KL Divergence (kl_alpha): Controls deviation from baseline distribution
Roughness Penalty (dv_alpha): Ensures smooth density functions
Mean Penalty (mean_alpha): Controls deviation from baseline mean

Key Components¶

1. Cutpoints System¶

DRN operates on discretized regions of the response variable:

[c₀, c₁) [c₁, c₂) ... [cₖ₋₁, cₖ]

c₀: Lower bound of refinement region
cₖ: Upper bound of refinement region
Number of intervals: Determined by cutpoints-to-observation ratio p

Why discretize? - Makes neural network training stable - Allows flexible density shapes - Enables efficient computation - Maintains probabilistic coherence

2. Distribution Objects¶

All models return distribution objects with methods:

dist = model.predict(X)

# Point estimates
mean = dist.mean                    # Expected value
mode = dist.mode                    # Most likely value

# Distributional properties  
pdf = dist.density(y_grid)          # Probability density
cdf = dist.cdf(y_grid)              # Cumulative distribution
quantiles = dist.quantiles([5, 95]) # Risk measures

# Evaluation
log_prob = dist.log_prob(y_true)    # Log-likelihood

3. Training Framework¶

DRN training follows a specific pattern:

# 1. Train baseline
baseline = GLM('gaussian').fit(X, y)

# 2. Define refinement region
cutpoints = drn_cutpoints(c_0, c_K, p, y, min_obs)

# 3. Initialize DRN
drn = DRN(baseline, cutpoints, hidden_size=128)

# 4. Train with custom loss
train(drn, drn_loss, train_data, val_data)

Model Types in DRN¶

Baseline Models¶

GLM (Generalized Linear Models)¶

Gaussian: Normal distribution with linear mean
Gamma: Gamma distribution for positive responses
Interpretable coefficients
Well-established statistical theory

Constant Model¶

Simple baseline predicting constant distribution
Useful for ablation studies
Minimal computational overhead

Advanced Models¶

Main model of the package
Combines interpretable baseline + neural refinement
Flexible distribution shapes
Controlled regularization

CANN (Combined Actuarial Neural Network)¶

Actuarial-focused architecture
Separate networks for different distributional parameters
Industry-standard approach

MDN (Mixture Density Network)¶

Models multimodal distributions
Mixture of simple distributions
Good for complex, multi-peaked data

DDR (Deep Distribution Regression)¶

Pure neural approach to distributional regression
No baseline constraint
Maximum flexibility

Regularization and Control¶

KL Divergence Control (`kl_alpha`)¶

Controls how much the final distribution can deviate from the baseline:

Small values (1e-5 to 1e-4): Stay close to baseline
Larger values: Allow more deviation
Direction: Forward or reverse KL divergence

Roughness Penalty (`dv_alpha`)¶

Ensures smooth density functions:

Larger values: Smoother densities
Smaller values: Allow more complex shapes
Balance: Trade-off between flexibility and stability

Mean Penalty (`mean_alpha`)¶

Controls deviation of predicted mean from baseline:

Zero: No constraint on mean
Small values (1e-5 to 1e-4): Gentle constraint
Larger values: Force mean to stay close to baseline

Evaluation Metrics¶

Distribution-Aware Metrics¶

Unlike traditional regression, distributional models need special evaluation:

CRPS (Continuous Ranked Probability Score)¶

Measures difference between predicted CDF and observed outcome
Lower is better
Rewards both accuracy and calibration

Quantile Loss¶

Evaluates specific quantile predictions
Asymmetric loss function
Important for risk management

Log-Likelihood (NLL)¶

Measures how well model assigns probability to observed outcomes
Higher likelihood = better fit
Can overfit if not regularized

Traditional Metrics¶

RMSE: Still useful for mean predictions
MAE: Less sensitive to outliers
R²: Explained variance (for means only)

Next Steps¶

Now that you understand the concepts, you can:

Try the Quick Start Guide - Hands-on experience
Explore Advanced Usage - Step-by-step examples

Basic Concepts¶

What is Distributional Regression?¶

Traditional Regression¶

Distributional Regression¶

The DRN Architecture¶

Core Philosophy¶

Two-Stage Approach¶

Stage 1: Baseline Model¶

Stage 2: Neural Refinement¶

Mathematical Framework¶

Key Components¶

1. Cutpoints System¶

2. Distribution Objects¶

3. Training Framework¶

Model Types in DRN¶

Baseline Models¶

GLM (Generalized Linear Models)¶

Constant Model¶

Advanced Models¶

DRN (Distributional Refinement Network)¶

CANN (Combined Actuarial Neural Network)¶

MDN (Mixture Density Network)¶

DDR (Deep Distribution Regression)¶

Regularization and Control¶

KL Divergence Control (`kl_alpha`)¶

Roughness Penalty (`dv_alpha`)¶

Mean Penalty (`mean_alpha`)¶

Evaluation Metrics¶

Distribution-Aware Metrics¶

CRPS (Continuous Ranked Probability Score)¶

Quantile Loss¶

Log-Likelihood (NLL)¶

Traditional Metrics¶

Next Steps¶

Further Reading¶

Basic Concepts¶

What is Distributional Regression?¶

Traditional Regression¶

Distributional Regression¶

The DRN Architecture¶

Core Philosophy¶

Two-Stage Approach¶

Stage 1: Baseline Model¶

Stage 2: Neural Refinement¶

Mathematical Framework¶

Key Components¶

1. Cutpoints System¶

2. Distribution Objects¶

3. Training Framework¶

Model Types in DRN¶

Baseline Models¶

GLM (Generalized Linear Models)¶

Constant Model¶

Advanced Models¶

DRN (Distributional Refinement Network)¶

CANN (Combined Actuarial Neural Network)¶

MDN (Mixture Density Network)¶

DDR (Deep Distribution Regression)¶

Regularization and Control¶

KL Divergence Control (kl_alpha)¶

Roughness Penalty (dv_alpha)¶

Mean Penalty (mean_alpha)¶

Evaluation Metrics¶

Distribution-Aware Metrics¶

CRPS (Continuous Ranked Probability Score)¶

Quantile Loss¶

Log-Likelihood (NLL)¶

Traditional Metrics¶

Next Steps¶

Further Reading¶

KL Divergence Control (`kl_alpha`)¶

Roughness Penalty (`dv_alpha`)¶

Mean Penalty (`mean_alpha`)¶