incerto.llm.f1_score_tokens#
- incerto.llm.f1_score_tokens(pred_tokens, true_tokens, mask=None)[source]#
Compute precision, recall, and F1 at token level.
This treats token prediction as a retrieval problem where: - True Positive: correct token at a valid (masked-in) position - False Positive: wrong token at a valid position - False Negative: true token at a masked-out position (not predicted)
When mask covers all positions (default), FN=0 and recall=1.0, making F1 equal to 2*precision/(1+precision). In this case, consider using token_level_accuracy() instead.
- Parameters:
- Return type:
- Returns:
Dictionary with precision, recall, F1, and token counts