loki2.encode_trans ================== .. py:module:: loki2.encode_trans .. autoapi-nested-parse:: Transcriptomics encoding module. This module provides functionality to encode transcriptomics data using pre-trained models for cross-modal analysis. Module Contents --------------- .. py:function:: encode_transcriptomics(ad_path: pathlib.Path, output_path: pathlib.Path, model_path: pathlib.Path, housekeeping_path: pathlib.Path, batch_size: int = 100, num_threads: Optional[int] = None, device: str = 'cpu') -> None Encode transcriptomics data using a pre-trained model. This function processes AnnData objects containing single-cell RNA-seq data, generates gene expression prompts, and encodes them using a pre-trained CLIP model for cross-modal analysis. :param ad_path: Path to the input AnnData (.h5ad) file. :param output_path: Path where encoded embeddings will be saved (.pt file). :param model_path: Path to the pre-trained model checkpoint. :param housekeeping_path: Path to CSV file containing housekeeping genes. :param batch_size: Batch size for encoding. Defaults to 100. :param num_threads: Number of threads for PyTorch operations. If None, uses default. Defaults to None. :param device: Device to use for encoding ('cpu' or 'cuda'). Defaults to 'cpu'. :raises ValueError: If observation names are not unique, if duplicate or missing cell identifiers are found, or if num_threads is invalid. .. py:function:: load_model(model_path: pathlib.Path, device: str = 'cuda') -> Tuple[Any, Any, Any] Load a pre-trained CoCa model and tokenizer. :param model_path: Path to the pre-trained model checkpoint. :param device: Device to load the model on ('cpu' or 'cuda'). Defaults to 'cuda'. :returns: Tuple containing: - model: Loaded CoCa model - preprocess: Preprocessing function for images - tokenizer: Text tokenizer :rtype: Tuple[Any, Any, Any] .. py:function:: load_prompts_csv(csv_path: pathlib.Path) -> pandas.DataFrame Load gene prompts from a CSV file. :param csv_path: Path to the CSV file containing gene prompts. :returns: DataFrame with 'cell_id' and 'label' columns. :rtype: pd.DataFrame :raises ValueError: If required columns are missing. .. py:function:: generate_gene_df(ad: Any, house_keeping_genes: pandas.DataFrame, todense: bool = True) -> pandas.DataFrame Generate a DataFrame with the top 50 genes for each observation. Removes genes containing '.' or '-' in their names, as well as genes listed in the housekeeping genes DataFrame. :param ad: AnnData object containing gene expression data. :param house_keeping_genes: DataFrame with a 'genesymbol' column listing housekeeping genes to exclude. :param todense: Whether to convert the sparse matrix (ad.X) to a dense matrix before creating a DataFrame. Defaults to True. :returns: DataFrame with two columns: 'cell_id' and 'label'. Each label entry is a string with the top 50 gene names (space-separated) for that observation. :rtype: pd.DataFrame .. py:function:: encode_texts(model, tokenizer, texts, batch_size=256, device='cuda') .. py:function:: build_parser() -> argparse.ArgumentParser .. py:function:: main(argv: Iterable[str] = None) -> None