loki2.retrieve
==============

.. py:module:: loki2.retrieve

.. autoapi-nested-parse::

   Retrieval utilities for cross-modal embedding search in Loki2.

   This module provides functionality for retrieving similar embeddings across
   different modalities, including k-NN majority voting and numeric pooling operations.


Module Contents
---------------

.. py:class:: RetrievalResult

   Result of a retrieval operation containing scores and indices.

   :param scores: Similarity scores tensor of shape (M, K) where M is the number
                  of queries and K is the number of neighbors.
   :param indices: Indices tensor of shape (M, K) pointing to neighbors in the
                   reference pool.


   .. py:attribute:: scores
      :type:  torch.Tensor


   .. py:attribute:: indices
      :type:  torch.Tensor


   .. py:method:: to_tuple() -> Tuple[torch.Tensor, torch.Tensor]


   .. py:method:: save(path: Union[str, pathlib.Path]) -> None

      Persist scores/indices to disk via torch.save.

      :param path: Path where to save the retrieval result.


   .. py:method:: load(path: Union[str, pathlib.Path]) -> RetrievalResult
      :classmethod:


      Restore a RetrievalResult saved by `save`.

      :param path: Path to the saved retrieval result file.

      :returns: Loaded retrieval result.
      :rtype: RetrievalResult


   .. py:method:: majority_vote(source_labels: Union[Sequence, numpy.ndarray, torch.Tensor], *, weighted: Optional[bool] = False, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, return_counts: bool = False) -> Tuple[numpy.ndarray, Optional[torch.Tensor]]

      Run k-NN majority voting using stored indices and optional scores.

      :param source_labels: Labels for the reference pool.
      :param weighted: Whether to weight votes by similarity scores. Defaults to False.
      :param scores: Optional custom scores to use for weighting. Defaults to None.
      :param temperature: Optional temperature for softmax weighting. Defaults to None.
      :param return_counts: Whether to return vote counts. Defaults to False.

      :returns:     - Predicted labels
                    - Vote counts if return_counts=True, otherwise None
      :rtype: Tuple[np.ndarray, Optional[torch.Tensor]]


   .. py:method:: numeric_pool(source_values: Union[Sequence, numpy.ndarray, torch.Tensor], *, weighted: Optional[bool] = False, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]

      Pool neighbor-associated numeric values with optional weighting.

      :param source_values: Numeric values associated with reference pool.
      :param weighted: Whether to weight by similarity scores. Defaults to False.
      :param scores: Optional custom scores for weighting. Defaults to None.
      :param temperature: Optional temperature for softmax weighting. Defaults to None.
      :param reduction: Reduction method ('mean', 'weighted_mean', 'median', 'sum',
                        'max', 'min'). Defaults to 'mean'.
      :param eps: Numerical stability constant. Defaults to 1e-12.
      :param return_weights: Whether to return applied weights. Defaults to False.

      :returns:     Pooled values, and optionally the weights used.
      :rtype: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]


.. py:function:: cross_modal_retrieve(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, topk: int = 1, *, as_result: bool = False) -> Union[RetrievalResult, Tuple[torch.Tensor, torch.Tensor]]

   Perform cross-modal retrieval between query embeddings and the embedding pool.

   :param query_embeddings: (M, D) queries.
   :param embedding_pool: (N, D) database embeddings.
   :param topk: number of neighbors to keep.
   :param as_result: if True, return a RetrievalResult instead of a tuple.

   :returns: RetrievalResult if as_result else (values, indices).


.. py:function:: retrieve_with_celltype_filter(query_embeddings: torch.Tensor, embedding_pool: torch.Tensor, pool_labels: Union[Sequence[Any], numpy.ndarray, torch.Tensor], topk: int = 20, *, normalize_centroids: bool = True, return_assignments: bool = True) -> Union[RetrievalResult, Tuple[RetrievalResult, numpy.ndarray, torch.Tensor]]

   Assign queries to cell types via centroid similarity and retrieve neighbors
   only from matching cell types.

   :param query_embeddings: (M, D) tensor of query embeddings.
   :param embedding_pool: (N, D) tensor of reference embeddings.
   :param pool_labels: Iterable of length N with cell type labels.
   :param topk: Number of neighbors to retrieve per query.
   :param normalize_centroids: Whether to L2-normalize cell type centroids.
   :param return_assignments: If True, also return predicted labels and scores.

   :returns: RetrievalResult if return_assignments is False, otherwise a tuple of
             (RetrievalResult, predicted_labels, centroid_similarities).


.. py:function:: knn_majority_vote(indices: torch.Tensor, source_labels: Union[Sequence, numpy.ndarray, torch.Tensor], *, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, return_counts: bool = False) -> Tuple[numpy.ndarray, Optional[torch.Tensor]]

   Compute the majority label among retrieved neighbors.

   :param indices: (M, K) neighbor indices pointing into the source pool.
   :param source_labels: (N,) labels (ints or strings).
   :param scores: optional (M, K) similarity scores to weight votes.
   :param temperature: optional temperature for softmax weighting.
   :param return_counts: whether to return per-class vote totals.


.. py:function:: knn_numeric_pool(indices: torch.Tensor, source_values: Union[Sequence, numpy.ndarray, torch.Tensor], *, scores: Optional[torch.Tensor] = None, temperature: Optional[float] = None, reduction: NumericReductionT = 'mean', eps: float = 1e-12, return_weights: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]]

   Aggregate numeric attributes associated with retrieved items.

   :param indices: (M, K) neighbor indices pointing into the source pool.
   :param source_values: tensor-like of shape (N, ...) aligned with the embedding pool.
   :param scores: optional custom scores to weight contributions.
   :param temperature: softmax temperature applied to scores before weighted averaging.
   :param reduction: reduction to apply across neighbors.
   :param eps: numerical stability constant used by weighted_mean.
   :param return_weights: whether to return the weights that were applied.


.. py:function:: compare_labelings(met1: Sequence, met2: Sequence, *, labels: Optional[Sequence] = None, plot: bool = True, normalize: NormalizeT = 'row', title: Optional[str] = None, xlabel: str = 'Method 2', ylabel: str = 'Method 1', cmap: str = 'viridis', figsize=(8, 7), annotate: bool = True, fontsize_ticks: int = 10, fontsize_text: int = 7, savepath: Optional[str] = None) -> Dict[str, Any]

   Compute similarity metrics between two label arrays and optionally plot the confusion matrix.


.. py:function:: labels_to_hex(pred: numpy.ndarray, color_dict: dict, default: str = '#808080') -> numpy.ndarray

   Return an array of hex colors with the same shape as pred.