loki2.retrieve ============== .. py:module:: loki2.retrieve .. autoapi-nested-parse:: Retrieval utilities for cross-modal embedding search in Loki2. This module provides functionality for retrieving similar embeddings across different modalities, including k-NN majority voting and numeric pooling operations. Module Contents --------------- .. py:class:: RetrievalResult Result of a retrieval operation containing scores and indices. :param scores: Similarity scores tensor of shape (M, K) where M is the number of queries and K is the number of neighbors. :param indices: Indices tensor of shape (M, K) pointing to neighbors in the reference pool. .. py:attribute:: scores :type: torch.Tensor .. py:attribute:: indices :type: torch.Tensor .. py:method:: to_tuple() -> Tuple[torch.Tensor, torch.Tensor] .. py:method:: save(path: Union[str, pathlib.Path]) -> None Persist scores/indices to disk via torch.save. :param path: Path where to save the retrieval result. .. py:method:: load(path: Union[str, pathlib.Path]) -> RetrievalResult :classmethod: Restore a RetrievalResult saved by `save`. :param path: Path to the saved retrieval result file. :returns: Loaded retrieval result. :rtype: RetrievalResult .. py:method:: majority_vote(source_labels: Union[Sequence, numpy.ndarray, torch.Tensor], *, weighted: Optional[bool] = False, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, return_counts: bool = False) -> Tuple[numpy.ndarray, Optional[torch.Tensor]] Run k-NN majority voting using stored indices and optional scores. :param source_labels: Labels for the reference pool. :param weighted: Whether to weight votes by similarity scores. Defaults to False. :param scores: Optional custom scores to use for weighting. Defaults to None. :param temperature: Optional temperature for softmax weighting. Defaults to None. :param return_counts: Whether to return vote counts. Defaults to False. :returns: - Predicted labels - Vote counts if return_counts=True, otherwise None :rtype: Tuple[np.ndarray, Optional[torch.Tensor]] .. py:method:: numeric_pool(source_values: Union[Sequence, numpy.ndarray, torch.Tensor], *, weighted: Optional[bool] = False, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]] Pool neighbor-associated numeric values with optional weighting. :param source_values: Numeric values associated with reference pool. :param weighted: Whether to weight by similarity scores. Defaults to False. :param scores: Optional custom scores for weighting. Defaults to None. :param temperature: Optional temperature for softmax weighting. Defaults to None. :param reduction: Reduction method ('mean', 'weighted_mean', 'median', 'sum', 'max', 'min'). Defaults to 'mean'. :param eps: Numerical stability constant. Defaults to 1e-12. :param return_weights: Whether to return applied weights. Defaults to False. :returns: Pooled values, and optionally the weights used. :rtype: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]] .. py:function:: cross_modal_retrieve(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, topk: int = 1, *, as_result: bool = False) -> Union[RetrievalResult, Tuple[torch.Tensor, torch.Tensor]] Perform cross-modal retrieval between query embeddings and the embedding pool. :param query_embeddings: (M, D) queries. :param embedding_pool: (N, D) database embeddings. :param topk: number of neighbors to keep. :param as_result: if True, return a RetrievalResult instead of a tuple. :returns: RetrievalResult if as_result else (values, indices). .. py:function:: retrieve_with_celltype_filter(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, pool_labels: Union[Sequence[Any], numpy.ndarray, torch.Tensor], topk: int = 20, *, normalize_centroids: bool = True, return_assignments: bool = True) -> Union[RetrievalResult, Tuple[RetrievalResult, numpy.ndarray, torch.Tensor]] Assign queries to cell types via centroid similarity and retrieve neighbors only from matching cell types. :param query_embeddings: (M, D) tensor of query embeddings. :param embedding_pool: (N, D) tensor of reference embeddings. :param pool_labels: Iterable of length N with cell type labels. :param topk: Number of neighbors to retrieve per query. :param normalize_centroids: Whether to L2-normalize cell type centroids. :param return_assignments: If True, also return predicted labels and scores. :returns: RetrievalResult if return_assignments is False, otherwise a tuple of (RetrievalResult, predicted_labels, centroid_similarities). .. py:function:: knn_majority_vote(indices: torch.Tensor, source_labels: Union[Sequence, numpy.ndarray, torch.Tensor], *, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, return_counts: bool = False) -> Tuple[numpy.ndarray, Optional[torch.Tensor]] Compute the majority label among retrieved neighbors. :param indices: (M, K) neighbor indices pointing into the source pool. :param source_labels: (N,) labels (ints or strings). :param scores: optional (M, K) similarity scores to weight votes. :param temperature: optional temperature for softmax weighting. :param return_counts: whether to return per-class vote totals. .. py:function:: knn_numeric_pool(indices: torch.Tensor, source_values: Union[Sequence, numpy.ndarray, torch.Tensor], *, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]] Aggregate numeric attributes associated with retrieved items. :param indices: (M, K) neighbor indices pointing into the source pool. :param source_values: tensor-like of shape (N, ...) aligned with the embedding pool. :param scores: optional custom scores to weight contributions. :param temperature: softmax temperature applied to scores before weighted averaging. :param reduction: reduction to apply across neighbors. :param eps: numerical stability constant used by weighted_mean. :param return_weights: whether to return the weights that were applied. .. py:function:: compare_labelings(met1: Sequence, met2: Sequence, *, labels: Optional[Sequence] = None, plot: bool = True, normalize: NormalizeT = 'row', title: Optional[str] = None, xlabel: str = 'Method 2', ylabel: str = 'Method 1', cmap: str = 'viridis', figsize=(8, 7), annotate: bool = True, fontsize_ticks: int = 10, fontsize_text: int = 7, savepath: Optional[str] = None) -> Dict[str, Any] Compute similarity metrics between two label arrays and optionally plot the confusion matrix. .. py:function:: labels_to_hex(pred: numpy.ndarray, color_dict: dict, default: str = '#808080') -> numpy.ndarray Return an array of hex colors with the same shape as pred.