Calibration#
The calibration module provides methods for calibrating neural network predictions to ensure that confidence scores accurately reflect the true probability of correctness.
Base Class#
Abstract base class for all calibration methods. |
Post-hoc Calibration Methods#
|
Temperature scaling for calibration: scales logits by a learned temperature. |
|
Vector Scaling (Guo et al., 2017). |
|
Matrix Scaling (Guo et al., 2017). |
Platt scaling (logistic regression) calibration per class (one-vs-rest). |
|
|
Multi-class isotonic regression calibration (per-class fitting). |
|
Histogram binning calibration: bins predicted probabilities and uses empirical frequencies. |
|
Dirichlet Calibration (Kull et al., 2019). |
Beta Calibration for binary classification (Kull et al., 2017). |
|
No-op calibrator that returns the original softmax probabilities. |
Training-time Calibration#
|
Label Smoothing for improved calibration. |
|
Focal Loss for handling hard examples. |
|
Confidence Penalty to prevent overconfidence. |
|
Evidential Deep Learning loss. |
|
Compute uncertainty measures from evidential outputs. |
|
Temperature-aware training with learnable temperature. |
Metrics#
|
Expected Calibration Error (ECE). |
|
Maximum Calibration Error (MCE). |
|
Class-wise ECE: average ECE computed separately for each class. |
|
Adaptive Expected Calibration Error (Nixon et al., 2019). |
|
Smooth Expected Calibration Error (smECE). |
|
Brier score: mean squared error between one-hot labels and predicted probabilities. |
|
Negative Log-Likelihood (cross-entropy) averaged over samples. |
Visualization#
|
Plot a reliability diagram comparing confidence vs accuracy. |
|
Plot a smooth reliability diagram using kernel smoothing. |
|
Plot a histogram of model confidences (max softmax probability). |
|
Plot calibration curve: accuracy vs. |
Utilities#
|
Helper to compute per-bin average confidence, accuracy, and counts. |
Extract confidence scores and predictions from logits. |
|
|
Convert logits to probabilities using softmax. |