{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loki Decompose - CRC Sample" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates how to run *Loki Decompose* on a dataset of colorectal cancer sample. It takes about 2 mins to run this notebook on MacBook Pro." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import scanpy as sc\n", "import torch\n", "import numpy as np \n", "import pandas as pd\n", "import os \n", "from matplotlib import pyplot as plt\n", "import matplotlib\n", "\n", "import loki.decompose\n", "# set resolution to 80 dpi for showing in the notebook\n", "# matplotlib.rcParams['figure.dpi'] = 80\n", "sc.set_figure_params(dpi=80, facecolor='white', frameon=False, color_map='viridis')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first finetune OmiCLIP model on the Visium colorectal cancer data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Comment this line to fine-tune the model on the CRC dataset.\n" ] } ], "source": [ "%%script echo \"Comment this line to fine-tune the model on the CRC dataset.\"\n", "\n", "import subprocess\n", "import open_clip\n", "\n", "model_name='coca_ViT-L-14'\n", "pretrained_weight_path='path to the omiclip pretrained weight'\n", "train_csv = 'visium_data/finetune_data.csv'\n", "name = 'finetune_crc'\n", "\n", "train_command = [\n", " 'python', '-m', 'training.main',\n", " '--name', name,\n", " '--save-frequency', '5',\n", " '--zeroshot-frequency', '10',\n", " '--report-to', 'wandb',\n", " '--train-data', train_csv,\n", " '--csv-img-key', 'img_path',\n", " '--csv-caption-key', 'label',\n", " '--warmup', '10',\n", " '--batch-size', '64',\n", " '--lr', '5e-6',\n", " '--wd', '0.1',\n", " '--epochs', '5',\n", " '--workers', '16',\n", " '--model', model_name,\n", " '--csv-separator', ',',\n", " '--pretrained', pretrained_weight_path,\n", " '--lock-text-freeze-layer-norm',\n", " '--lock-image-freeze-bn-stats',\n", " '--coca-caption-loss-weight','0',\n", " '--coca-contrastive-loss-weight','1',\n", " '--val-frequency', '10',\n", " '--aug-cfg', 'color_jitter=(0.32, 0.32, 0.32, 0.08)', 'color_jitter_prob=0.5', 'gray_scale_prob=0'\n", "]\n", "\n", "subprocess.run(train_command)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide the embeddings generated from the OmiCLIP model.\n", "The sample data and embeddings are stored in the directory `data/loki_decompose/CRC_data`, which can be donwloaded from [Google Drive link](https://drive.google.com/file/d/1aPK1nItsOEPxTihUAKMig-vLY-DMMIce/view?usp=sharing).\n", "\n", "Here is a list of the files that are needed to run the cell type decomposition on the pseudo Visium data:\n", "```\n", " . \n", " ├── checkpoint_finetune_crc \n", " │ ├── sc_text_emb.pt \n", " │ ├── val_data_all_val_img_emb.pt \n", " │ ├── val_data_val_img_emb.pt \n", " │ └── val_data_val_txt_emb.pt \n", " ├── scRNA_data \n", " │ └── sc_CRC2_labels.csv \n", " └── pseudo_visium_data \n", " ├── val_data_all.csv (# meta data for embedding extraction from the entire slide) \n", " ├── val_data.csv (# meta data for embedding extraction from the Visium capture region) \n", " ├── cell_centroids.npy \n", " └── Visium_HD_Human_Colon_Cancer_P2_8bin_simulated_100_55_processed.h5ad \n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data_dir = \"./data/loki_decompose/CRC_data/pseudo_visium_data\"\n", "checkpoint_name = \"finetune_crc\"\n", "embedding_dir = f\"./data/loki_decompose/CRC_data/checkpoint_{checkpoint_name}\"\n", "cell_adata_labels_path = \"./data/loki_decompose/CRC_data/scRNA_data/sc_CRC2_labels.csv\"\n", "\n", "fig_dir = f\"{checkpoint_name}_figures\"\n", "os.makedirs(fig_dir, exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3593 × 18085\n", " obs: 'DeconvolutionLabel1', 'nonEmptySpots', 'n_counts', 'leiden', 'B cells', 'Endothelial', 'Fibroblast', 'Intestinal Epithelial', 'Myeloid', 'Neuronal', 'Smooth Muscle', 'T cells', 'Tumor', 'Unknown'\n", " var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm'\n", " uns: 'Celltype_colors', 'DeconvolutionLabel1_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'mapping', 'neighbors', 'pca', 'spatial'\n", " obsm: 'X_pca', 'deconvolution', 'spatial'\n", " varm: 'PCs'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The spatial coordinates and the HE image from the spot data were used to visualize the predcition results.\n", "spot_adata_path = f\"{data_dir}/Visium_HD_Human_Colon_Cancer_P2_8bin_simulated_100_55_processed.h5ad\"\n", "spot_adata = sc.read_h5ad(spot_adata_path)\n", "spot_adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load pre-computed spot image embeddings" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([3593, 768])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spot_im_embedding_path = f'{embedding_dir}/val_data_val_img_emb.pt'\n", "spot_im_embedding = torch.load(spot_im_embedding_path)\n", "spot_im_embedding.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load pre-computed spot ST embeddings" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([3593, 768])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spot_tx_embedding_path = f'{embedding_dir}/val_data_val_txt_emb.pt'\n", "spot_tx_embedding = torch.load(spot_tx_embedding_path)\n", "spot_tx_embedding.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load pre-computed single-cell RNA-seq embeddings" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([35695, 768])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cell_embedding_path =f\"{embedding_dir}/sc_text_emb.pt\"\n", "cell_embedding = torch.load(cell_embedding_path)\n", "cell_embedding.shape" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Level1 | \n", "
---|---|
AAACAAGCAACAGCTAACTTTAGG-1 | \n", "B cells | \n", "
AAACAAGCAACTGTTCACTTTAGG-1 | \n", "T cells | \n", "
AAACAAGCAAGGCCTGACTTTAGG-1 | \n", "Tumor | \n", "
AAACAAGCACATAGTGACTTTAGG-1 | \n", "Tumor | \n", "
AAACAAGCAGCATTTCACTTTAGG-1 | \n", "Myeloid | \n", "
... | \n", "... | \n", "
TTTGGCGGTGGCGTAGACTTTAGG-7 | \n", "B cells | \n", "
TTTGGCGGTTAGTGCTACTTTAGG-7 | \n", "Smooth Muscle | \n", "
TTTGTGAGTCCGCTAAACTTTAGG-7 | \n", "Tumor | \n", "
TTTGTGAGTCCGGGTTACTTTAGG-7 | \n", "Myeloid | \n", "
TTTGTGAGTCTTTATCACTTTAGG-7 | \n", "Tumor | \n", "
35695 rows × 1 columns
\n", "