Active Learning Guide#

Active learning reduces labeling costs by strategically selecting which samples to label. Instead of random sampling, query the most informative examples.

Why Active Learning#

Labeling is expensive:

Medical image annotation requires expert radiologists
NLP tasks need careful human review
Robotics needs real-world interaction

Active learning can achieve same performance with 10-100x less labeled data.

Core Idea#

Train model on small labeled set
Query strategy: Select most informative unlabeled samples
Get labels for selected samples (human annotation)
Add to training set, retrain
Repeat until budget exhausted or performance adequate

Acquisition Functions#

Uncertainty Sampling#

Best for: Starting point, simple and effective

Query samples where model is most uncertain:

from incerto.active import entropy_acquisition, UncertaintySampling

strategy = UncertaintySampling(
    model,
    acquisition_fn=entropy_acquisition
)

# Query most uncertain samples
query_indices = strategy.query(
    unlabeled_pool,
    n_samples=100
)

# Label these samples
labeled_samples = label_samples(unlabeled_pool[query_indices])

Variants:

Least confidence: Query samples with lowest max probability
Margin sampling: Query samples with smallest difference between top-2 classes
Entropy: Query samples with highest entropy

Entropy Acquisition#

from incerto.active import entropy_acquisition

# Compute entropy for each sample
logits = model(unlabeled_data)
probs = F.softmax(logits, dim=-1)
entropy_scores = entropy_acquisition(probs)

# Higher entropy = more uncertain = higher priority
top_k_indices = torch.argsort(entropy_scores, descending=True)[:k]

Least Confidence#

from incerto.active import least_confidence_acquisition

logits = model(unlabeled_data)
probs = F.softmax(logits, dim=-1)
confidence_scores = least_confidence_acquisition(probs)

# Lower confidence = higher priority
top_k_indices = torch.argsort(confidence_scores)[:k]

Margin Sampling#

from incerto.active import margin_acquisition

logits = model(unlabeled_data)
probs = F.softmax(logits, dim=-1)
margin_scores = margin_acquisition(probs)

# Smaller margin = more uncertain = higher priority
top_k_indices = torch.argsort(margin_scores)[:k]

BALD (Bayesian Active Learning by Disagreement)#

Best for: When using Bayesian methods (MC Dropout, ensembles)

Query samples with highest mutual information:

from incerto.active import BALDAcquisition
from incerto.bayesian import MCDropout

# Use MC Dropout for Bayesian uncertainty
mc_dropout = MCDropout(model, n_samples=10)

strategy = BALDAcquisition(mc_dropout)
query_indices = strategy.query(unlabeled_pool, n_samples=100)

Intuition: Query where model weights disagree most (high epistemic uncertainty)

Advantages:

Theoretically motivated
Considers model uncertainty
Often outperforms entropy

Disadvantages:

Requires Bayesian model
More computationally expensive

Reference: Houlsby et al., “Bayesian Active Learning for Classification” (AIStats 2011)

Complete Active Learning Loop#

import torch
from incerto.active import UncertaintySampling, entropy_acquisition

# Initial setup
labeled_data = initial_labeled_set  # Small labeled set
unlabeled_pool = large_unlabeled_set
budget = 1000  # Number of labels we can afford

# Initial model
model = train_model(labeled_data)

# Active learning loop
n_queries = budget // 100  # Query 100 samples at a time

for round in range(n_queries):
    print(f"Round {round + 1}/{n_queries}")

    # 1. Select samples to label
    strategy = UncertaintySampling(
        model,
        acquisition_fn=entropy_acquisition
    )

    query_indices = strategy.query(
        unlabeled_pool,
        n_samples=100
    )

    # 2. Get labels (human annotation or oracle)
    query_samples = unlabeled_pool[query_indices]
    query_labels = get_labels(query_samples)  # Human labeling

    # 3. Add to labeled set
    labeled_data.add(query_samples, query_labels)

    # 4. Remove from unlabeled pool
    unlabeled_pool.remove(query_indices)

    # 5. Retrain model
    model = train_model(labeled_data)

    # 6. Evaluate
    accuracy = evaluate(model, test_set)
    print(f"Accuracy: {accuracy:.2%}")
    print(f"Labeled samples: {len(labeled_data)}")

Practical Tips#

Batch mode:: Query multiple samples at once for efficiency

# Query 100 samples in batch
query_indices = strategy.query(unlabeled_pool, n_samples=100)

Diversity:: Combine uncertainty with diversity to avoid querying similar samples

from incerto.active import diverse_batch_query

# Select diverse AND uncertain samples
query_indices = diverse_batch_query(
    model,
    unlabeled_pool,
    n_samples=100,
    diversity_weight=0.5
)

Cold start:: Begin with random sampling or stratified sampling

# Initial random sample
initial_size = 100
initial_indices = torch.randperm(len(dataset))[:initial_size]
labeled_data = dataset[initial_indices]

Stopping criteria:: Stop when performance plateaus or budget exhausted

if accuracy > target_accuracy:
    print("Target accuracy reached!")
    break

if len(labeled_data) >= max_budget:
    print("Budget exhausted!")
    break

Evaluation#

Learning curve:: Plot accuracy vs. number of labeled samples

import matplotlib.pyplot as plt

plt.plot(n_labeled_samples, accuracies, label='Active')
plt.plot(n_labeled_samples, random_accuracies, label='Random')
plt.xlabel('Number of Labeled Samples')
plt.ylabel('Test Accuracy')
plt.legend()

Area Under Learning Curve (AULC):: Higher is better
Reduction ratio:: How much data saved to reach target accuracy

# E.g., active learning reaches 95% accuracy with 1000 samples
# Random sampling needs 5000 samples for same accuracy
# Reduction ratio = 5000 / 1000 = 5x

Best Practices#

Start with uncertainty sampling
Simple, effective baseline
Use batch queries
Query 50-100 samples at a time for efficiency
Consider diversity
Prevent querying redundant samples
Retrain frequently
Model needs to adapt to new labels
Use Bayesian methods when possible
BALD often outperforms simple uncertainty
Compare to random baseline
Always benchmark against random sampling
Monitor labeling quality
Human labels may be noisy or biased

Common Pitfalls#

❌ Querying only hardest samples: Can lead to noisy/outlier labels
❌ Not using diversity: Queries may be redundant
❌ Infrequent retraining: Model doesn’t benefit from new labels
❌ Wrong initial set: Cold start matters - use stratified sampling
❌ Ignoring label noise: Uncertain samples may have unreliable labels

Advanced Topics#

Query by committee:: Use ensemble disagreement instead of single model uncertainty
Expected model change:: Query samples that change model most
Expected error reduction:: Query samples that reduce expected error most

References#

Settles, “Active Learning Literature Survey” (2009)
Houlsby et al., “Bayesian Active Learning for Classification” (AIStats 2011)
Gal et al., “Deep Bayesian Active Learning” (ICML 2017)
Ash et al., “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds” (ICLR 2020)

Active Learning Guide

Contents