{ "cells": [ { "cell_type": "markdown", "id": "90cc04bb", "metadata": {}, "source": [ "# Loki Decompose - TNBC Sample\n", "This notebook demonstrates how to run *Loki Decompose* on the in-house triple-negative breast cancer (TNBC) sample. It takes about 1 min to run this notebook on MacBook Pro." ] }, { "cell_type": "code", "execution_count": 1, "id": "73875244-bbcd-4157-98a8-5a55b865fc82", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "import scanpy as sc\n", "\n", "import loki.decompose\n", "# sc.settings.set_figure_params(dpi=, facecolor=\"white\")" ] }, { "cell_type": "markdown", "id": "4b071b1f", "metadata": {}, "source": [ "We first finetune OmiCLIP model on the pseudo Visium data in one of four sample region." ] }, { "cell_type": "code", "execution_count": 2, "id": "993694fa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Comment this line to fine-tune the model on the TNBC data.\n" ] } ], "source": [ "%%script echo \"Comment this line to fine-tune the model on the TNBC data.\"\n", "\n", "import subprocess\n", "import open_clip\n", "\n", "model_name='coca_ViT-L-14'\n", "pretrained_weight_path='path to the omiclip pretrained weight'\n", "train_csv = 'visium_data/finetune_data.csv'\n", "name = 'finetune_tnbc'\n", "\n", "train_command = [\n", " 'python', '-m', 'training.main',\n", " '--name', name,\n", " '--save-frequency', '5',\n", " '--zeroshot-frequency', '10',\n", " '--report-to', 'wandb',\n", " '--train-data', train_csv,\n", " '--csv-img-key', 'img_path',\n", " '--csv-caption-key', 'label',\n", " '--warmup', '10',\n", " '--batch-size', '64',\n", " '--lr', '5e-6',\n", " '--wd', '0.1',\n", " '--epochs', '5',\n", " '--workers', '16',\n", " '--model', model_name,\n", " '--csv-separator', ',',\n", " '--pretrained', pretrained_weight_path,\n", " '--lock-text-freeze-layer-norm',\n", " '--lock-image-freeze-bn-stats',\n", " '--coca-caption-loss-weight','0',\n", " '--coca-contrastive-loss-weight','1',\n", " '--val-frequency', '10',\n", " '--aug-cfg', 'color_jitter=(0.32, 0.32, 0.32, 0.08)', 'color_jitter_prob=0.5', 'gray_scale_prob=0'\n", "]\n", "\n", "subprocess.run(train_command)" ] }, { "cell_type": "markdown", "id": "dcccf838", "metadata": {}, "source": [ "We provide the embeddings generated from the OmiCLIP model.\n", "The sample data and embeddings are stored in the directory `data/loki_decompose/TNBC_data`, which can be donwloaded from [Google Drive link](https://drive.google.com/file/d/1aPK1nItsOEPxTihUAKMig-vLY-DMMIce/view?usp=sharing).\n", "\n", "Here is a list of the files that are needed to run the cell type decomposition on the pseudo Visium data:\n", "```\n", " . \n", " ├── checkpoint_tnbc \n", " │ ├── TNBC_img_features_finetune.csv \n", " │ ├── TNBC_txt_features_finetune.csv \n", " │ └── TNBC_txt_features_sc_finetune.csv \n", " ├── scRNA_data \n", " │ └── scRNA_data.h5ad \n", " └── pseudo_visium_data \n", " ├── TNBC_pseudo_Visium.h5ad \n", " └── TNBC_Xenium_celltype_spot.h5ad \n", "```" ] }, { "cell_type": "code", "execution_count": 5, "id": "b73bc9f4", "metadata": {}, "outputs": [], "source": [ "data_path = './data/loki_decompose/TNBC_data/'\n", "sample_name = 'TNBC'" ] }, { "cell_type": "code", "execution_count": 6, "id": "e2f53072-02da-43f8-b133-735e1a79e8d9", "metadata": {}, "outputs": [], "source": [ "def generate_deconv_df(sc_ad, st_ad):\n", " for cell_type in st_ad.obsm['tangram_ct_pred'].columns:\n", " st_ad.obsm['tangram_ct_pred'][cell_type]=st_ad.obsm['tangram_ct_pred'][cell_type]*sc_ad.obs['cell_type'].value_counts()[cell_type]\n", " \n", " st_ad.obsm['tangram_ct_pred']['Immune']=st_ad.obsm['tangram_ct_pred']['Macrophage']+ \\\n", " st_ad.obsm['tangram_ct_pred']['B cell'] + \\\n", " st_ad.obsm['tangram_ct_pred']['T cell']\n", " st_ad.obsm['tangram_ct_pred'].drop(['Macrophage','B cell','T cell'],axis=1, inplace=True)\n", " st_ad.obsm['tangram_ct_pred'] = st_ad.obsm['tangram_ct_pred'][['Epithelial', 'Immune', 'Stroma']]\n", " \n", " deconv_df = st_ad.obsm['tangram_ct_pred'].T/st_ad.obsm['tangram_ct_pred'].T.sum()\n", " deconv_df = deconv_df.T\n", "\n", " return deconv_df" ] }, { "cell_type": "code", "execution_count": 7, "id": "8d6fc5a3-9c52-436a-a1f4-bddefde8696b", "metadata": {}, "outputs": [], "source": [ "sc_data_raw = sc.read_h5ad(os.path.join(data_path, 'scRNA_data', 'TNBC_SC_preprocess.h5ad'))\n", "ad_vis = sc.read_h5ad(os.path.join(data_path, 'pseudo_visium_data', 'TNBC_pseudo_Visium.h5ad'))" ] }, { "cell_type": "markdown", "id": "f3389af6-68af-49d8-b5e9-4b435501b43a", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "Visualize cell type component defined by Xenium data (ground truth data)." ] }, { "cell_type": "code", "execution_count": 8, "id": "3eab3382-91d1-4f09-abd2-70419ab62f82", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
cell_type | \n", "Epithelial | \n", "Immune | \n", "Stroma | \n", "
---|---|---|---|
TNBC_processed_adata_AACACGTGCATCGCAC-1 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "
TNBC_processed_adata_AACACTTGGCAAGGAA-1 | \n", "0.750000 | \n", "0.250000 | \n", "0.000000 | \n", "
TNBC_processed_adata_AACAGGAAGAGCATAG-1 | \n", "0.772727 | \n", "0.000000 | \n", "0.227273 | \n", "
TNBC_processed_adata_AACAGGATTCATAGTT-1 | \n", "0.916667 | \n", "0.083333 | \n", "0.000000 | \n", "
TNBC_processed_adata_AACAGGTTATTGCACC-1 | \n", "0.846154 | \n", "0.153846 | \n", "0.000000 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
TNBC_processed_adata_TGTTGGAACCTTCCGC-1 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
TNBC_processed_adata_TGTTGGAACGAGGTCA-1 | \n", "0.968750 | \n", "0.000000 | \n", "0.031250 | \n", "
TNBC_processed_adata_TGTTGGAAGCTCGGTA-1 | \n", "0.793103 | \n", "0.000000 | \n", "0.206897 | \n", "
TNBC_processed_adata_TGTTGGATGGACTTCT-1 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
TNBC_processed_adata_TGTTGGCCTACACGTG-1 | \n", "0.800000 | \n", "0.200000 | \n", "0.000000 | \n", "
3402 rows × 3 columns
\n", "