Bayesian Deep Learning Guide#
Bayesian deep learning quantifies epistemic uncertainty - uncertainty due to limited training data. This is crucial for knowing when a model is uncertain due to lack of knowledge vs. inherent randomness.
Types of Uncertainty#
- Aleatoric (data) uncertainty:
Irreducible noise in the data Example: Image is blurry, label is inherently ambiguous
- Epistemic (model) uncertainty:
Reducible - decreases with more training data Example: Model hasn’t seen examples of this class
Bayesian approach: Maintain distribution over model weights instead of point estimate.
Methods#
MC Dropout#
Best for: Easy retrofitting, practical epistemic uncertainty
Approximate Bayesian inference by using dropout at test time:
from incerto.bayesian import MCDropout
# Use dropout during inference
mc_dropout = MCDropout(model, n_samples=10)
# Get predictions with uncertainty
result = mc_dropout.predict(test_data)
mean_pred = result['mean'] # Average prediction
epistemic = result['epistemic'] # Model uncertainty
aleatoric = result['aleatoric'] # Data uncertainty
total_unc = result['total'] # Total uncertainty
# Entropy and mutual information
entropy = result['predictive_entropy']
mutual_info = result['mutual_information']
print(f"Epistemic uncertainty: {epistemic.mean():.4f}")
print(f"Model is uncertain where it hasn't seen data")
- How it works:
Enable dropout during inference
Run multiple forward passes (Monte Carlo sampling)
Aggregate predictions to estimate uncertainty
- Advantages:
Works with any model with dropout
No retraining needed
Fast and practical
- Disadvantages:
Approximate (not true Bayesian posterior)
Requires dropout in model
Quality depends on n_samples
Reference: Gal & Ghahramani, “Dropout as a Bayesian Approximation” (ICML 2016)
Deep Ensembles#
Best for: Best empirical performance, production systems
Train multiple models with different initializations:
from incerto.bayesian import DeepEnsemble
# Train multiple models
models = [train_model(seed=i) for i in range(5)]
ensemble = DeepEnsemble(models)
# Get predictions
result = ensemble.predict(test_data, return_all=True)
mean_pred = result['mean']
epistemic = result['epistemic']
# Measure diversity
diversity = ensemble.diversity(test_data)
print(f"Ensemble diversity: {diversity:.4f}")
- Advantages:
State-of-the-art uncertainty estimates
Simple to implement
Reliable
- Disadvantages:
5-10x training cost
5-10x inference cost
Significant memory overhead
Reference: Lakshminarayanan et al., “Simple and Scalable Predictive Uncertainty” (NeurIPS 2017)
SWAG (Stochastic Weight Averaging Gaussian)#
Best for: Efficient approximation, low overhead
Approximates posterior using weight statistics during training:
from incerto.bayesian import SWAG
swag = SWAG(model, num_classes=10)
# Collect model snapshots during training
for epoch in range(epochs):
for batch in train_loader:
# Train normally
loss = train_step(model, batch)
# Collect model statistics
swag.collect_model(model)
# Sample from approximate posterior
result = swag.predict(test_data, n_samples=10)
epistemic = result['epistemic']
- Advantages:
Low overhead (one training run)
Good uncertainty estimates
Efficient inference
- Disadvantages:
Requires specific training procedure
Less accurate than deep ensembles
Needs careful tuning
Reference: Maddox et al., “A Simple Baseline for Bayesian Uncertainty” (NeurIPS 2019)
Laplace Approximation#
Best for: Post-hoc uncertainty without retraining
Gaussian approximation around MAP estimate:
from incerto.bayesian import LaplaceApproximation
# Train model normally
model = train_model(train_loader)
# Fit Laplace approximation
laplace = LaplaceApproximation(model, num_classes=10)
laplace.fit(train_loader)
# Get predictions with uncertainty
result = laplace.predict(test_data, n_samples=10)
- Advantages:
Works with pre-trained models
Theoretically motivated
Efficient
- Disadvantages:
Requires Hessian computation (expensive)
Gaussian assumption may be poor
Less accurate than MC Dropout or ensembles
Variational Inference#
Best for: True Bayesian approach, research
Learn distribution over weights via variational inference:
from incerto.bayesian import VariationalBayesNN
# Create Bayesian neural network
model = VariationalBayesNN(
input_dim=784,
hidden_dim=256,
output_dim=10
)
# Training with VI loss
for inputs, labels in train_loader:
# Forward pass samples from weight distribution
outputs = model(inputs)
# VI loss = likelihood + KL divergence
kl_div = model.kl_divergence()
nll = F.cross_entropy(outputs, labels)
loss = nll + kl_div / len(train_loader)
loss.backward()
optimizer.step()
# Inference
result = model.predict(test_data, n_samples=10)
- Advantages:
Principled Bayesian approach
Learns weight distributions explicitly
- Disadvantages:
Requires model redesign
Computationally expensive
Difficult to tune
Complete Workflow#
import torch
from incerto.bayesian import MCDropout
# 1. Train model with dropout
model = create_model_with_dropout(p=0.1)
train_model(model, train_loader)
# 2. Create MC Dropout predictor
mc_dropout = MCDropout(model, n_samples=20)
# 3. Get predictions with uncertainty
all_epistemic = []
all_correct = []
for inputs, labels in test_loader:
result = mc_dropout.predict(inputs)
predictions = result['mean'].argmax(dim=-1)
epistemic = result['epistemic']
correct = (predictions == labels).float()
all_epistemic.append(epistemic)
all_correct.append(correct)
epistemic = torch.cat(all_epistemic)
correct = torch.cat(all_correct)
# 4. Analyze uncertainty vs. correctness
# High epistemic → likely incorrect
import matplotlib.pyplot as plt
plt.scatter(epistemic[correct==1], label='Correct')
plt.scatter(epistemic[correct==0], label='Incorrect')
plt.xlabel('Epistemic Uncertainty')
plt.legend()
# 5. Use for selective prediction
threshold = epistemic.quantile(0.8) # Abstain on top 20% uncertain
predictions[epistemic > threshold] = -1 # Abstain
Evaluation#
- Negative Log-Likelihood (NLL):
Measures both accuracy and uncertainty calibration
from incerto.bayesian.metrics import negative_log_likelihood
nll = negative_log_likelihood(predictions, labels)
# Lower is better
- Brier Score:
Proper scoring rule for probabilistic predictions
from incerto.calibration import brier_score
bs = brier_score(predictions, labels)
- Expected Calibration Error (ECE):
Check if uncertainties are calibrated
from incerto.calibration import ece_score
ece = ece_score(predictions, labels)
Best Practices#
- Start with MC Dropout
Easiest to implement, works with existing models
- Use enough samples
MC Dropout: 10-20 samples minimum SWAG: 20-30 samples
- Combine with calibration
Bayesian uncertainties can still be miscalibrated
- Monitor epistemic uncertainty
High on out-of-distribution data
- Use for active learning
Query samples with high epistemic uncertainty
- Validate uncertainty quality
Plot uncertainty vs. correctness
Comparison#
Method |
Training Cost |
Inference Cost |
Quality |
|---|---|---|---|
MC Dropout |
1x |
10-20x |
Good |
Deep Ensembles |
5-10x |
5-10x |
Excellent |
SWAG |
~1.2x |
10-30x |
Good |
Laplace |
1x + Hessian |
10x |
Moderate |
Variational |
1-2x |
10x |
Good (theory) |
References#
Gal & Ghahramani, “Dropout as a Bayesian Approximation” (ICML 2016)
Lakshminarayanan et al., “Simple and Scalable Predictive Uncertainty” (NeurIPS 2017)
Maddox et al., “A Simple Baseline for Bayesian Uncertainty Estimation” (NeurIPS 2019)
Wilson & Izmailov, “Bayesian Deep Learning and a Probabilistic Perspective” (NeurIPS 2020)
See Also#
Bayesian Deep Learning - Complete API reference
Active Learning Guide - Use epistemic uncertainty for active learning
Selective Prediction Guide - Use uncertainty for selective prediction