loki2.retrieve
Retrieval utilities for cross-modal embedding search in Loki2.
This module provides functionality for retrieving similar embeddings across different modalities, including k-NN majority voting and numeric pooling operations.
Module Contents
- class loki2.retrieve.RetrievalResult
Result of a retrieval operation containing scores and indices.
- Parameters:
scores – Similarity scores tensor of shape (M, K) where M is the number of queries and K is the number of neighbors.
indices – Indices tensor of shape (M, K) pointing to neighbors in the reference pool.
- scores: torch.Tensor
- indices: torch.Tensor
- to_tuple() Tuple[torch.Tensor, torch.Tensor]
- save(path: str | pathlib.Path) None
Persist scores/indices to disk via torch.save.
- Parameters:
path – Path where to save the retrieval result.
- classmethod load(path: str | pathlib.Path) RetrievalResult
Restore a RetrievalResult saved by save.
- Parameters:
path – Path to the saved retrieval result file.
- Returns:
Loaded retrieval result.
- Return type:
- majority_vote(source_labels: Sequence | numpy.ndarray | torch.Tensor, *, weighted: bool | None = False, scores: torch.Tensor | None = None, temperature: float | None = None, return_counts: bool = False) Tuple[numpy.ndarray, torch.Tensor | None]
Run k-NN majority voting using stored indices and optional scores.
- Parameters:
source_labels – Labels for the reference pool.
weighted – Whether to weight votes by similarity scores. Defaults to False.
scores – Optional custom scores to use for weighting. Defaults to None.
temperature – Optional temperature for softmax weighting. Defaults to None.
return_counts – Whether to return vote counts. Defaults to False.
- Returns:
Predicted labels
Vote counts if return_counts=True, otherwise None
- Return type:
Tuple[np.ndarray, Optional[torch.Tensor]]
- numeric_pool(source_values: Sequence | numpy.ndarray | torch.Tensor, *, weighted: bool | None = False, scores: torch.Tensor | None = None, temperature: float | None = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) torch.Tensor | Tuple[torch.Tensor, torch.Tensor | None]
Pool neighbor-associated numeric values with optional weighting.
- Parameters:
source_values – Numeric values associated with reference pool.
weighted – Whether to weight by similarity scores. Defaults to False.
scores – Optional custom scores for weighting. Defaults to None.
temperature – Optional temperature for softmax weighting. Defaults to None.
reduction – Reduction method (‘mean’, ‘weighted_mean’, ‘median’, ‘sum’, ‘max’, ‘min’). Defaults to ‘mean’.
eps – Numerical stability constant. Defaults to 1e-12.
return_weights – Whether to return applied weights. Defaults to False.
- Returns:
Pooled values, and optionally the weights used.
- Return type:
Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]
- loki2.retrieve.cross_modal_retrieve(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, topk: int = 1, *, as_result: bool = False) RetrievalResult | Tuple[torch.Tensor, torch.Tensor]
Perform cross-modal retrieval between query embeddings and the embedding pool.
- Parameters:
query_embeddings – (M, D) queries.
embedding_pool – (N, D) database embeddings.
topk – number of neighbors to keep.
as_result – if True, return a RetrievalResult instead of a tuple.
- Returns:
RetrievalResult if as_result else (values, indices).
- loki2.retrieve.retrieve_with_celltype_filter(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, pool_labels: Sequence[Any] | numpy.ndarray | torch.Tensor, topk: int = 20, *, normalize_centroids: bool = True, return_assignments: bool = True) RetrievalResult | Tuple[RetrievalResult, numpy.ndarray, torch.Tensor]
Assign queries to cell types via centroid similarity and retrieve neighbors only from matching cell types.
- Parameters:
query_embeddings – (M, D) tensor of query embeddings.
embedding_pool – (N, D) tensor of reference embeddings.
pool_labels – Iterable of length N with cell type labels.
topk – Number of neighbors to retrieve per query.
normalize_centroids – Whether to L2-normalize cell type centroids.
return_assignments – If True, also return predicted labels and scores.
- Returns:
RetrievalResult if return_assignments is False, otherwise a tuple of (RetrievalResult, predicted_labels, centroid_similarities).
- loki2.retrieve.knn_majority_vote(indices: torch.Tensor, source_labels: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, return_counts: bool = False) Tuple[numpy.ndarray, torch.Tensor | None]
Compute the majority label among retrieved neighbors.
- Parameters:
indices – (M, K) neighbor indices pointing into the source pool.
source_labels – (N,) labels (ints or strings).
scores – optional (M, K) similarity scores to weight votes.
temperature – optional temperature for softmax weighting.
return_counts – whether to return per-class vote totals.
- loki2.retrieve.knn_numeric_pool(indices: torch.Tensor, source_values: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) torch.Tensor | Tuple[torch.Tensor, torch.Tensor | None]
Aggregate numeric attributes associated with retrieved items.
- Parameters:
indices – (M, K) neighbor indices pointing into the source pool.
source_values – tensor-like of shape (N, …) aligned with the embedding pool.
scores – optional custom scores to weight contributions.
temperature – softmax temperature applied to scores before weighted averaging.
reduction – reduction to apply across neighbors.
eps – numerical stability constant used by weighted_mean.
return_weights – whether to return the weights that were applied.
- loki2.retrieve.compare_labelings(met1: Sequence, met2: Sequence, *, labels: Sequence | None = None, plot: bool = True, normalize: NormalizeT = 'row', title: str | None = None, xlabel: str = 'Method 2', ylabel: str = 'Method 1', cmap: str = 'viridis', figsize=(8, 7), annotate: bool = True, fontsize_ticks: int = 10, fontsize_text: int = 7, savepath: str | None = None) Dict[str, Any]
Compute similarity metrics between two label arrays and optionally plot the confusion matrix.
- loki2.retrieve.labels_to_hex(pred: numpy.ndarray, color_dict: dict, default: str = '#808080') numpy.ndarray
Return an array of hex colors with the same shape as pred.