Bayesian Deep Learning Guide#
Bayesian deep learning quantifies epistemic uncertainty - uncertainty due to limited training data. This is crucial for knowing when a model is uncertain due to lack of knowledge vs. inherent randomness.
Types of Uncertainty#
- Aleatoric (data) uncertainty:
Irreducible noise in the data Example: Image is blurry, label is inherently ambiguous
- Epistemic (model) uncertainty:
Reducible - decreases with more training data Example: Model hasn’t seen examples of this class
Bayesian approach: Maintain distribution over model weights instead of point estimate.
Methods#
MC Dropout#
Best for: Easy retrofitting, practical epistemic uncertainty
Approximate Bayesian inference by using dropout at test time:
from incerto.bayesian import MCDropout
# Use dropout during inference
mc_dropout = MCDropout(model, num_samples=10)
# Get predictions with uncertainty
mean_pred, variance = mc_dropout.predict(test_data)
# Variance captures epistemic (model) uncertainty
print(f"Epistemic uncertainty: {variance.mean():.4f}")
# Entropy and mutual information
entropy = mc_dropout.predict_entropy(test_data)
mutual_info = mc_dropout.predict_mutual_information(test_data)
print(f"Model is uncertain where it hasn't seen data")
- How it works:
Enable dropout during inference
Run multiple forward passes (Monte Carlo sampling)
Aggregate predictions to estimate uncertainty
- Advantages:
Works with any model with dropout
No retraining needed
Fast and practical
- Disadvantages:
Approximate (not true Bayesian posterior)
Requires dropout in model
Quality depends on n_samples
Reference: Gal & Ghahramani, “Dropout as a Bayesian Approximation” (ICML 2016)
Deep Ensembles#
Best for: Best empirical performance, production systems
Train multiple models with different initializations:
from incerto.bayesian import DeepEnsemble
# Create ensemble with a model factory function
def create_model():
return MyModel()
ensemble = DeepEnsemble(create_model, num_models=5)
# Train each model separately
for i, model in enumerate(ensemble.models):
train_model(model, seed=i)
# Get predictions with uncertainty
mean_pred, variance = ensemble.predict(test_data)
# Or get all individual predictions
mean_pred, variance, all_preds = ensemble.predict(test_data, return_samples=True)
# Measure diversity
diversity = ensemble.diversity(test_data)
print(f"Ensemble diversity: {diversity.mean():.4f}")
- Advantages:
State-of-the-art uncertainty estimates
Simple to implement
Reliable
- Disadvantages:
5-10x training cost
5-10x inference cost
Significant memory overhead
Reference: Lakshminarayanan et al., “Simple and Scalable Predictive Uncertainty” (NeurIPS 2017)
SWAG (Stochastic Weight Averaging Gaussian)#
Best for: Efficient approximation, low overhead
Approximates posterior using weight statistics during training:
from incerto.bayesian import SWAG
swag = SWAG(model, num_samples=20)
# Collect model snapshots during training (after warmup)
for epoch in range(epochs):
for batch in train_loader:
# Train normally
loss = train_step(model, batch)
# Collect model statistics (typically after learning rate schedule)
if epoch >= warmup_epochs:
swag.collect_model(model)
# Sample from approximate posterior
mean_pred, variance = swag.predict(test_data)
# Variance captures epistemic uncertainty
print(f"Epistemic uncertainty: {variance.mean():.4f}")
- Advantages:
Low overhead (one training run)
Good uncertainty estimates
Efficient inference
- Disadvantages:
Requires specific training procedure
Less accurate than deep ensembles
Needs careful tuning
Reference: Maddox et al., “A Simple Baseline for Bayesian Uncertainty” (NeurIPS 2019)
Laplace Approximation#
Best for: Post-hoc uncertainty without retraining
Gaussian approximation around MAP estimate:
from incerto.bayesian import LaplaceApproximation
# Train model normally
model = train_model(train_loader)
# Fit Laplace approximation
laplace = LaplaceApproximation(
model,
likelihood='classification',
num_samples=20
)
laplace.fit(train_loader, device='cuda')
# Get predictions with uncertainty
mean_pred, variance = laplace.predict(test_data)
- Advantages:
Works with pre-trained models
Theoretically motivated
Efficient
- Disadvantages:
Requires Hessian computation (expensive)
Gaussian assumption may be poor
Less accurate than MC Dropout or ensembles
Variational Inference#
Best for: True Bayesian approach, research
Learn distribution over weights via variational inference:
from incerto.bayesian import VariationalBayesNN
# Create Bayesian neural network
model = VariationalBayesNN(
in_features=784,
hidden_sizes=[256, 128],
out_features=10,
num_samples=20
)
# Training with variational loss (combines NLL + KL)
optimizer = torch.optim.Adam(model.parameters())
for inputs, labels in train_loader:
optimizer.zero_grad()
loss = model.variational_loss(inputs, labels, num_samples=5)
loss.backward()
optimizer.step()
# Inference with uncertainty
mean_pred, variance = model.predict(test_data)
- Advantages:
Principled Bayesian approach
Learns weight distributions explicitly
- Disadvantages:
Requires model redesign
Computationally expensive
Difficult to tune
Complete Workflow#
import torch
from incerto.bayesian import MCDropout
# 1. Train model with dropout
model = create_model_with_dropout(p=0.1)
train_model(model, train_loader)
# 2. Create MC Dropout predictor
mc_dropout = MCDropout(model, num_samples=20)
# 3. Get predictions with uncertainty
all_variance = []
all_correct = []
for inputs, labels in test_loader:
mean_pred, variance = mc_dropout.predict(inputs)
predictions = mean_pred.argmax(dim=-1)
# Average variance across classes as uncertainty measure
uncertainty = variance.mean(dim=-1)
correct = (predictions == labels).float()
all_variance.append(uncertainty)
all_correct.append(correct)
uncertainty = torch.cat(all_variance)
correct = torch.cat(all_correct)
# 4. Analyze uncertainty vs. correctness
# High uncertainty → likely incorrect
import matplotlib.pyplot as plt
plt.hist(uncertainty[correct==1].numpy(), alpha=0.5, label='Correct')
plt.hist(uncertainty[correct==0].numpy(), alpha=0.5, label='Incorrect')
plt.xlabel('Uncertainty (Variance)')
plt.legend()
# 5. Use for selective prediction
threshold = uncertainty.quantile(0.8) # Abstain on top 20% uncertain
# Samples with uncertainty > threshold should be reviewed by human
Evaluation#
- Negative Log-Likelihood (NLL):
Measures both accuracy and uncertainty calibration
from incerto.bayesian.metrics import negative_log_likelihood
nll = negative_log_likelihood(predictions, labels)
# Lower is better
- Brier Score:
Proper scoring rule for probabilistic predictions
from incerto.calibration import brier_score
bs = brier_score(predictions, labels)
- Expected Calibration Error (ECE):
Check if uncertainties are calibrated
from incerto.calibration import ece_score
ece = ece_score(predictions, labels)
Best Practices#
- Start with MC Dropout
Easiest to implement, works with existing models
- Use enough samples
MC Dropout: 10-20 samples minimum SWAG: 20-30 samples
- Combine with calibration
Bayesian uncertainties can still be miscalibrated
- Monitor epistemic uncertainty
High on out-of-distribution data
- Use for active learning
Query samples with high epistemic uncertainty
- Validate uncertainty quality
Plot uncertainty vs. correctness
Comparison#
Method |
Training Cost |
Inference Cost |
Quality |
|---|---|---|---|
MC Dropout |
1x |
10-20x |
Good |
Deep Ensembles |
5-10x |
5-10x |
Excellent |
SWAG |
~1.2x |
10-30x |
Good |
Laplace |
1x + Hessian |
10x |
Moderate |
Variational |
1-2x |
10x |
Good (theory) |
References#
Gal & Ghahramani, “Dropout as a Bayesian Approximation” (ICML 2016)
Lakshminarayanan et al., “Simple and Scalable Predictive Uncertainty” (NeurIPS 2017)
Maddox et al., “A Simple Baseline for Bayesian Uncertainty Estimation” (NeurIPS 2019)
Wilson & Izmailov, “Bayesian Deep Learning and a Probabilistic Perspective” (NeurIPS 2020)
See Also#
Bayesian Deep Learning - Complete API reference
Active Learning Guide - Use epistemic uncertainty for active learning
Selective Prediction Guide - Use uncertainty for selective prediction