Calibration#

The calibration module provides methods for calibrating neural network predictions to ensure that confidence scores accurately reflect the true probability of correctness.

Post-hoc Calibration Methods#

TemperatureScaling([init_temp])

Temperature scaling for calibration: scales logits by a learned temperature.

VectorScaling(n_classes)

Vector Scaling (Guo et al., 2017).

MatrixScaling(n_classes)

Matrix Scaling (Guo et al., 2017).

DirichletCalibrator(n_classes[, mu])

Dirichlet Calibration (Kull et al., 2019).

BetaCalibrator([method])

Beta Calibration for binary classification (Kull et al., 2017).

IdentityCalibrator()

No-op calibrator that returns the original softmax probabilities.

Training-time Calibration#

LabelSmoothingLoss([smoothing, reduction])

Label Smoothing for improved calibration.

FocalLoss([alpha, gamma, reduction])

Focal Loss for handling hard examples.

ConfidencePenalty([beta])

Confidence Penalty to prevent overconfidence.

TemperatureAwareTraining(backbone[, ...])

Temperature-aware training with learnable temperature.

Metrics#

ece_score(logits, labels[, n_bins])

Expected Calibration Error (ECE).

mce_score(logits, labels[, n_bins])

Maximum Calibration Error (MCE).

adaptive_ece_score(logits, labels[, n_bins, ...])

Adaptive Expected Calibration Error (Nixon et al., 2019).

brier_score(logits, labels)

Brier score: mean squared error between one-hot labels and predicted probabilities.

Visualization#

plot_reliability_diagram(logits, labels[, ...])

Plot a reliability diagram comparing confidence vs accuracy.

plot_confidence_histogram(logits[, n_bins, ...])

Plot a histogram of model confidences (max softmax probability).

plot_calibration_curve(logits, labels[, ...])

Plot calibration curve: accuracy vs.

Utilities#