Selective Prediction Guide#
Selective prediction (also called prediction with rejection) allows models to abstain from uncertain predictions, improving accuracy on predictions that are made.
Why Selective Prediction#
Key idea: Don’t predict when uncertain
- Benefits:
Higher accuracy on accepted predictions
Explicit uncertainty communication
Safer deployment in critical applications
- Trade-off:
Coverage: Fraction of samples where prediction is made
Risk: Error rate on predictions that are made
Goal: Maximize accuracy while maintaining acceptable coverage.
Basic Concepts#
- Selective classifier:
(f, g) where f predicts, g decides whether to abstain
- Coverage:
Φ = P(g(x) = 1) = fraction of samples where we predict
- Selective risk:
R_φ = E[ℓ(f(x), y) | g(x) = 1] = error rate on accepted samples
- Selective accuracy:
Accuracy on samples where prediction is made
Methods#
Softmax Threshold#
Best for: Simple baseline, post-hoc application
Abstain when max softmax probability < threshold:
from incerto.sp import SoftmaxThreshold
# Wrap your trained model
selector = SoftmaxThreshold(model)
selector.eval()
# Forward pass with confidence scores
with torch.no_grad():
logits, confidence = selector(test_data, return_confidence=True)
predictions = logits.argmax(dim=-1)
# Set threshold and reject low-confidence samples
threshold = confidence.quantile(0.2) # reject bottom 20%
rejected = selector.reject(confidence, threshold)
selected = ~rejected
accuracy = (predictions[selected] == labels[selected]).float().mean()
coverage = selected.float().mean()
print(f"Coverage: {coverage:.2%}")
print(f"Selective accuracy: {accuracy:.2%}")
- Advantages:
Simple, fast
No retraining needed
Interpretable
- Disadvantages:
Threshold requires tuning
May not be optimal
Deep Gambler#
Best for: Learning when to abstain during training
Adds an extra abstain logit and trains with the gambler’s loss:
from incerto.sp import DeepGambler
# Create model with abstain head
gambler = DeepGambler(backbone, num_classes=10, num_features=128)
# Training loop
for inputs, labels in train_loader:
logits = gambler(inputs) # shape: (batch, num_classes + 1)
loss = gambler.gambler_loss(logits, labels, reward=2.2)
loss.backward()
optimizer.step()
# Inference — confidence is 1 - P(abstain)
logits, confidence = gambler(test_data, return_confidence=True)
Self-Adaptive Training (SAT)#
Best for: Improving calibration during training for better selective prediction
Train model with adaptive soft labels that blend ground truth and model predictions:
from incerto.sp import SelfAdaptiveTraining
sat = SelfAdaptiveTraining(
backbone,
num_classes=10,
alpha_start=0.0,
alpha_end=0.9,
warmup_epochs=5,
)
# Training loop
for epoch in range(total_epochs):
alpha = sat.get_alpha(epoch, total_epochs)
for inputs, labels in train_loader:
logits = sat(inputs)
loss = sat.sat_loss(logits, labels, alpha)
loss.backward()
optimizer.step()
# Inference — uses MSP confidence like SoftmaxThreshold
logits, confidence = sat(test_data, return_confidence=True)
SelectiveNet#
Best for: Learning a dedicated selection function
Adds a selection head g(x) that outputs a selection probability:
from incerto.sp import SelectiveNet
snet = SelectiveNet(backbone, num_classes=10, num_features=128)
# Training loop — use the SelectiveNet loss
for inputs, labels in train_loader:
logits, selection = snet(inputs, return_confidence=True)
loss = snet.selective_loss(logits, labels, selection, coverage_target=0.8)
loss.backward()
optimizer.step()
# Inference — confidence comes from the selection head g(x)
logits, confidence = snet(test_data, return_confidence=True)
rejected = snet.reject(confidence, threshold=0.5)
Complete Workflow#
import torch
from incerto.sp import SoftmaxThreshold, coverage, risk, aurc
# 1. Train model normally
model = train_model(train_loader)
# 2. Wrap with selective predictor
selector = SoftmaxThreshold(model)
selector.eval()
# 3. Get predictions and confidence on validation set
with torch.no_grad():
logits, confidence = selector(val_data, return_confidence=True)
predictions = logits.argmax(dim=-1)
# 4. Evaluate at different thresholds
for threshold in [0.7, 0.8, 0.9, 0.95]:
rejected = selector.reject(confidence, threshold)
selected = ~rejected
cov = coverage(rejected)
sel_acc = (predictions[selected] == val_labels[selected]).float().mean()
print(f"Threshold {threshold}: coverage={cov:.2%}, accuracy={sel_acc:.2%}")
# 5. Compute AURC
sorted_conf, idx = confidence.sort(descending=True)
sorted_errors = (predictions[idx] != val_labels[idx]).float()
score = aurc(sorted_conf, sorted_errors)
print(f"AURC: {score:.4f}")
Metrics#
- Coverage-Risk Curve:
Plot risk vs. coverage across thresholds
from incerto.sp import plot_risk_coverage
fig, ax = plt.subplots()
plot_risk_coverage(logits, labels, confidence, ax=ax, show_aurc=True)
- Area Under Risk-Coverage Curve (AURC):
Lower is better (perfect = 0)
from incerto.sp import aurc
sorted_conf, idx = confidence.sort(descending=True)
sorted_errors = (predictions[idx] != labels[idx]).float()
score = aurc(sorted_conf, sorted_errors)
Best Practices#
- Tune threshold on validation data
Never use test data for threshold selection
- Consider deployment constraints
What coverage rate is acceptable?
- Combine with calibration
Calibrated models have better selection
- Monitor in production
Track coverage and accuracy over time
- Plan for abstention
What happens when model abstains? (Human review, fallback model, etc.)
Trade-offs#
- High threshold (e.g., 0.95):
Lower coverage (~70%)
Higher accuracy on accepted samples
More abstentions
- Low threshold (e.g., 0.7):
Higher coverage (~95%)
Lower accuracy on accepted samples
Fewer abstentions
- Choose based on:
Cost of errors vs. cost of abstention
Availability of fallback (human expert, simpler model)
Application requirements
References#
Chow, “An optimum character recognition system using decision functions” (1957)
Geifman & El-Yaniv, “Selective Classification for Deep Neural Networks” (NeurIPS 2017)
Geifman & El-Yaniv, “SelectiveNet: A Deep Neural Network with a Rejection Option” (ICML 2019)
Ziyin et al., “Deep Gamblers: Learning to Abstain with Portfolio Theory” (NeurIPS 2019)
Huang et al., “Self-Adaptive Training: beyond Empirical Risk Minimization” (NeurIPS 2020)
See Also#
Selective Prediction - Complete API reference
Calibration Guide - Calibration for better confidence
Conformal Prediction Guide - Prediction sets with guarantees
Out-of-Distribution Detection Guide - OOD detection