loki2.retrieve

Retrieval utilities for cross-modal embedding search in Loki2.

This module provides functionality for retrieving similar embeddings across different modalities, including k-NN majority voting and numeric pooling operations.

Module Contents

class loki2.retrieve.RetrievalResult

Result of a retrieval operation containing scores and indices.

Parameters:
  • scores – Similarity scores tensor of shape (M, K) where M is the number of queries and K is the number of neighbors.

  • indices – Indices tensor of shape (M, K) pointing to neighbors in the reference pool.

scores: torch.Tensor
indices: torch.Tensor
to_tuple() Tuple[torch.Tensor, torch.Tensor]
save(path: str | pathlib.Path) None

Persist scores/indices to disk via torch.save.

Parameters:

path – Path where to save the retrieval result.

classmethod load(path: str | pathlib.Path) RetrievalResult

Restore a RetrievalResult saved by save.

Parameters:

path – Path to the saved retrieval result file.

Returns:

Loaded retrieval result.

Return type:

RetrievalResult

majority_vote(source_labels: Sequence | numpy.ndarray | torch.Tensor, *, weighted: bool | None = False, scores: torch.Tensor | None = None, temperature: float | None = None, return_counts: bool = False) Tuple[numpy.ndarray, torch.Tensor | None]

Run k-NN majority voting using stored indices and optional scores.

Parameters:
  • source_labels – Labels for the reference pool.

  • weighted – Whether to weight votes by similarity scores. Defaults to False.

  • scores – Optional custom scores to use for weighting. Defaults to None.

  • temperature – Optional temperature for softmax weighting. Defaults to None.

  • return_counts – Whether to return vote counts. Defaults to False.

Returns:

  • Predicted labels

  • Vote counts if return_counts=True, otherwise None

Return type:

Tuple[np.ndarray, Optional[torch.Tensor]]

numeric_pool(source_values: Sequence | numpy.ndarray | torch.Tensor, *, weighted: bool | None = False, scores: torch.Tensor | None = None, temperature: float | None = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) torch.Tensor | Tuple[torch.Tensor, torch.Tensor | None]

Pool neighbor-associated numeric values with optional weighting.

Parameters:
  • source_values – Numeric values associated with reference pool.

  • weighted – Whether to weight by similarity scores. Defaults to False.

  • scores – Optional custom scores for weighting. Defaults to None.

  • temperature – Optional temperature for softmax weighting. Defaults to None.

  • reduction – Reduction method (‘mean’, ‘weighted_mean’, ‘median’, ‘sum’, ‘max’, ‘min’). Defaults to ‘mean’.

  • eps – Numerical stability constant. Defaults to 1e-12.

  • return_weights – Whether to return applied weights. Defaults to False.

Returns:

Pooled values, and optionally the weights used.

Return type:

Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]

loki2.retrieve.cross_modal_retrieve(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, topk: int = 1, *, as_result: bool = False) RetrievalResult | Tuple[torch.Tensor, torch.Tensor]

Perform cross-modal retrieval between query embeddings and the embedding pool.

Parameters:
  • query_embeddings – (M, D) queries.

  • embedding_pool – (N, D) database embeddings.

  • topk – number of neighbors to keep.

  • as_result – if True, return a RetrievalResult instead of a tuple.

Returns:

RetrievalResult if as_result else (values, indices).

loki2.retrieve.retrieve_with_celltype_filter(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, pool_labels: Sequence[Any] | numpy.ndarray | torch.Tensor, topk: int = 20, *, normalize_centroids: bool = True, return_assignments: bool = True) RetrievalResult | Tuple[RetrievalResult, numpy.ndarray, torch.Tensor]

Assign queries to cell types via centroid similarity and retrieve neighbors only from matching cell types.

Parameters:
  • query_embeddings – (M, D) tensor of query embeddings.

  • embedding_pool – (N, D) tensor of reference embeddings.

  • pool_labels – Iterable of length N with cell type labels.

  • topk – Number of neighbors to retrieve per query.

  • normalize_centroids – Whether to L2-normalize cell type centroids.

  • return_assignments – If True, also return predicted labels and scores.

Returns:

RetrievalResult if return_assignments is False, otherwise a tuple of (RetrievalResult, predicted_labels, centroid_similarities).

loki2.retrieve.knn_majority_vote(indices: torch.Tensor, source_labels: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, return_counts: bool = False) Tuple[numpy.ndarray, torch.Tensor | None]

Compute the majority label among retrieved neighbors.

Parameters:
  • indices – (M, K) neighbor indices pointing into the source pool.

  • source_labels – (N,) labels (ints or strings).

  • scores – optional (M, K) similarity scores to weight votes.

  • temperature – optional temperature for softmax weighting.

  • return_counts – whether to return per-class vote totals.

loki2.retrieve.knn_numeric_pool(indices: torch.Tensor, source_values: Sequence | numpy.ndarray | torch.Tensor, *, scores: torch.Tensor | None = None, temperature: float | None = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) torch.Tensor | Tuple[torch.Tensor, torch.Tensor | None]

Aggregate numeric attributes associated with retrieved items.

Parameters:
  • indices – (M, K) neighbor indices pointing into the source pool.

  • source_values – tensor-like of shape (N, …) aligned with the embedding pool.

  • scores – optional custom scores to weight contributions.

  • temperature – softmax temperature applied to scores before weighted averaging.

  • reduction – reduction to apply across neighbors.

  • eps – numerical stability constant used by weighted_mean.

  • return_weights – whether to return the weights that were applied.

loki2.retrieve.compare_labelings(met1: Sequence, met2: Sequence, *, labels: Sequence | None = None, plot: bool = True, normalize: NormalizeT = 'row', title: str | None = None, xlabel: str = 'Method 2', ylabel: str = 'Method 1', cmap: str = 'viridis', figsize=(8, 7), annotate: bool = True, fontsize_ticks: int = 10, fontsize_text: int = 7, savepath: str | None = None) Dict[str, Any]

Compute similarity metrics between two label arrays and optionally plot the confusion matrix.

loki2.retrieve.labels_to_hex(pred: numpy.ndarray, color_dict: dict, default: str = '#808080') numpy.ndarray

Return an array of hex colors with the same shape as pred.