loki2.retrieve

Retrieval utilities for cross-modal embedding search in Loki2.

This module provides functionality for retrieving similar embeddings across different modalities, including k-NN majority voting and numeric pooling operations.

Module Contents

class loki2.retrieve.RetrievalResult

Result of a retrieval operation containing scores and indices.

Parameters:

scores – Similarity scores tensor of shape (M, K) where M is the number of queries and K is the number of neighbors.
indices – Indices tensor of shape (M, K) pointing to neighbors in the reference pool.

scores: torch.Tensor

indices: torch.Tensor

to_tuple() → Tuple[torch.Tensor, torch.Tensor]

save(path: str | pathlib.Path) → None

Persist scores/indices to disk via torch.save.

Parameters:: path – Path where to save the retrieval result.

classmethod load(path: str | pathlib.Path) → RetrievalResult

Restore a RetrievalResult saved by save.

Parameters:: path – Path to the saved retrieval result file.
Returns:: Loaded retrieval result.
Return type:: RetrievalResult

Run k-NN majority voting using stored indices and optional scores.

Parameters:

source_labels – Labels for the reference pool.
weighted – Whether to weight votes by similarity scores. Defaults to False.
scores – Optional custom scores to use for weighting. Defaults to None.
temperature – Optional temperature for softmax weighting. Defaults to None.
return_counts – Whether to return vote counts. Defaults to False.

Returns:

Predicted labels
Vote counts if return_counts=True, otherwise None

Return type:

Tuple[np.ndarray, Optional[torch.Tensor]]

Pool neighbor-associated numeric values with optional weighting.

Parameters:

source_values – Numeric values associated with reference pool.
weighted – Whether to weight by similarity scores. Defaults to False.
scores – Optional custom scores for weighting. Defaults to None.
temperature – Optional temperature for softmax weighting. Defaults to None.
reduction – Reduction method (‘mean’, ‘weighted_mean’, ‘median’, ‘sum’, ‘max’, ‘min’). Defaults to ‘mean’.
eps – Numerical stability constant. Defaults to 1e-12.
return_weights – Whether to return applied weights. Defaults to False.

Returns:

Pooled values, and optionally the weights used.

Return type:

Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]

loki2.retrieve.cross_modal_retrieve(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, topk: int = 1, *, as_result: bool = False) → RetrievalResult | Tuple[torch.Tensor, torch.Tensor]

Perform cross-modal retrieval between query embeddings and the embedding pool.

Parameters:

query_embeddings – (M, D) queries.
embedding_pool – (N, D) database embeddings.
topk – number of neighbors to keep.
as_result – if True, return a RetrievalResult instead of a tuple.

Returns:

RetrievalResult if as_result else (values, indices).

loki2.retrieve.retrieve_with_celltype_filter(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, pool_labels: Sequence[Any] | numpy.ndarray | torch.Tensor, topk: int = 20, *, normalize_centroids: bool = True, return_assignments: bool = True) → RetrievalResult | Tuple[RetrievalResult, numpy.ndarray, torch.Tensor]

Assign queries to cell types via centroid similarity and retrieve neighbors only from matching cell types.

Parameters:

query_embeddings – (M, D) tensor of query embeddings.
embedding_pool – (N, D) tensor of reference embeddings.
pool_labels – Iterable of length N with cell type labels.
topk – Number of neighbors to retrieve per query.
normalize_centroids – Whether to L2-normalize cell type centroids.
return_assignments – If True, also return predicted labels and scores.

Returns:

RetrievalResult if return_assignments is False, otherwise a tuple of (RetrievalResult, predicted_labels, centroid_similarities).

loki2.retrieve.knn_majority_vote(indices: torch.Tensor, source_labels: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, return_counts: bool = False) → Tuple[numpy.ndarray, torch.Tensor | None]

Compute the majority label among retrieved neighbors.

Parameters:

indices – (M, K) neighbor indices pointing into the source pool.
source_labels – (N,) labels (ints or strings).
scores – optional (M, K) similarity scores to weight votes.
temperature – optional temperature for softmax weighting.
return_counts – whether to return per-class vote totals.

loki2.retrieve.knn_numeric_pool(indices: torch.Tensor, source_values: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) → torch.Tensor | Tuple[torch.Tensor, torch.Tensor | None]

Aggregate numeric attributes associated with retrieved items.

Parameters:

indices – (M, K) neighbor indices pointing into the source pool.
source_values – tensor-like of shape (N, …) aligned with the embedding pool.
scores – optional custom scores to weight contributions.
temperature – softmax temperature applied to scores before weighted averaging.
reduction – reduction to apply across neighbors.
eps – numerical stability constant used by weighted_mean.
return_weights – whether to return the weights that were applied.

loki2.retrieve.compare_labelings(met1: Sequence, met2: Sequence, *, labels: Sequence | None = None, plot: bool = True, normalize: NormalizeT = 'row', title: str | None = None, xlabel: str = 'Method 2', ylabel: str = 'Method 1', cmap: str = 'viridis', figsize=(8, 7), annotate: bool = True, fontsize_ticks: int = 10, fontsize_text: int = 7, savepath: str | None = None) → Dict[str, Any]: Compute similarity metrics between two label arrays and optionally plot the confusion matrix.

loki2.retrieve.labels_to_hex(pred: numpy.ndarray, color_dict: dict, default: str = '#808080') → numpy.ndarray: Return an array of hex colors with the same shape as pred.