incerto.llm.LexicalSimilarity#

class incerto.llm.LexicalSimilarity[source]#

Bases: object

Measure lexical similarity across samples.

Compute exact match rate, token overlap, or edit distance to quantify how similar the generations are.

__init__()#

Methods

__init__()

exact_match_rate(responses[, normalize_fn])

Compute fraction of responses that exactly match the most common.

pairwise_token_overlap(responses[, normalize_fn])

Average pairwise token overlap (Jaccard similarity).

static exact_match_rate(responses, normalize_fn=None)[source]#

Compute fraction of responses that exactly match the most common.

Parameters:
  • responses (List[str]) – List of generated text responses

  • normalize_fn (Optional[Callable[[str], str]]) – Optional function to normalize responses before comparison

Return type:

float

Returns:

Exact match rate (0-1)

static pairwise_token_overlap(responses, normalize_fn=None)[source]#

Average pairwise token overlap (Jaccard similarity).

Parameters:
  • responses (List[str]) – List of generated text responses

  • normalize_fn (Optional[Callable[[str], str]]) – Optional function to normalize responses before comparison

Return type:

float

Returns:

Average Jaccard similarity across all pairs