Loki PredEx

This notebook demonstrates how to run Loki PredEx on the normal human heart dataset. It takes about 1 min to run this notebook on MacBook Pro.

[1]:

import scanpy as sc
import pandas as pd
import numpy as np
import os

import loki.predex
sc.settings.set_figure_params(dpi=80, facecolor="white")

We provide the image-ST similarity matrix generated from the OmiCLIP model. The sample data and embeddings are stored in the directory data/loki_predex/, which can be donwloaded from Google Drive link.

Here is a list of the files that are needed to run the ST gene expression prediction on the human heart dataset:

.
├── similarity_matrix
│   └── image_text_similarity.npy
├── training_data
│   ├── all_shared_genes.txt
│   ├── combined_expression_matrix.npy
│   ├── combined_obs.npy
│   └── train_df.csv
└── validation_data
    ├── HCAHeartST11702009.h5ad
    ├── top300_gene_list.npy
    └── val_df.csv

[2]:

data_path = './data/loki_predex/'

[3]:

adata = sc.read_h5ad(os.path.join(data_path, 'validation_data', 'HCAHeartST11702009.h5ad'))
genelist = list(np.load(os.path.join(data_path, 'validation_data', 'top300_gene_list.npy'),allow_pickle=True))
ad = adata[:, adata.var_names.isin(genelist)]
sc.pl.spatial(adata, img_key="hires", show=False, spot_size=10)
del adata
val_df = pd.read_csv(os.path.join(data_path, 'validation_data', 'val_df.csv'), index_col=0)
val_spots = val_df.index.tolist()

../_images/notebooks_Loki_PredEx_case_study_4_0.png

Loki PredEx by H&E image

Use Loki PredEx to predict ST gene expression from H&E image.

[4]:

combined_expression_array = np.load(os.path.join(data_path, 'training_data', 'combined_expression_matrix.npy'))
combined_obs_array = np.load(os.path.join(data_path,'training_data', 'combined_obs.npy'))
train_df = pd.read_csv(os.path.join(data_path, 'training_data', 'train_df.csv'), index_col=0)
train_spots = train_df.index.tolist()
with open(os.path.join(data_path, 'training_data', 'all_shared_genes.txt'), 'r') as f:
    shared_genes = [line.strip() for line in f]

train_indices = np.isin(combined_obs_array, train_spots)
val_indices = np.isin(combined_obs_array, val_spots)

train_data = combined_expression_array[train_indices, :]
val_data = combined_expression_array[val_indices, :]

del combined_expression_array
del train_df

[5]:

image_text_similarity = np.load(os.path.join(data_path, 'similarity_matrix', 'image_text_similarity.npy'))
predicted_image_text_matrix = loki.predex.predict_st_gene_expr(image_text_similarity, train_data)
prediction = pd.DataFrame(predicted_image_text_matrix, index=val_df.index,columns=shared_genes)
predict_data = prediction[genelist]
predict_data = predict_data.loc[ad.obs_names]
predict_data

[5]:

	APLP2	BEX3	KIF1C	NFKBIA	NUCB1	JUN	PSMD8	PTGES3	EEF1B2	HADHA	...	TTN	CRYAB	DES	MYH6	TNNT2	TPM1	MYL7	ACTC1	MB	NPPA
spot_id
HCAHeartST11702009_AAACAACGAATAGTTC-1	0.873993	0.570633	0.809642	0.502104	0.748403	0.493271	0.748796	0.549227	0.932378	1.023527	...	3.124240	3.412881	3.790170	2.929568	3.479691	3.594158	3.225036	3.629994	4.128861	3.029146
HCAHeartST11702009_AAACAAGTATCTCCCA-1	0.886108	0.600569	0.823685	0.526464	0.767610	0.504052	0.797189	0.567454	0.975902	1.036880	...	3.118218	3.477846	3.808141	2.917475	3.476100	3.638908	3.224683	3.681861	4.140285	3.016911
HCAHeartST11702009_AAACACCAATAACTGC-1	0.888327	0.585750	0.832512	0.477843	0.753783	0.492416	0.765825	0.541675	0.917877	1.077905	...	3.162872	3.472018	3.868768	2.871716	3.553461	3.674515	3.118482	3.696733	4.202581	2.791825
HCAHeartST11702009_AAACAGAGCGACTCCT-1	0.856133	0.540058	0.783130	0.480710	0.730232	0.474532	0.718386	0.533094	0.908748	1.009164	...	3.102248	3.388016	3.767592	2.832028	3.452923	3.571324	3.136117	3.597626	4.109611	2.900072
HCAHeartST11702009_AAACAGCTTTCAGAAG-1	0.874205	0.568552	0.813926	0.492851	0.746465	0.484543	0.756364	0.541125	0.933277	1.039306	...	3.139337	3.427375	3.812937	2.882198	3.484470	3.606930	3.169055	3.631085	4.145370	2.900765
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
HCAHeartST11702009_TTGTTGTGTGTCAAGA-1	0.894089	0.579132	0.829608	0.511592	0.768178	0.499199	0.772891	0.574577	0.942139	1.041307	...	3.096955	3.393776	3.846852	2.862902	3.527897	3.593799	3.242400	3.615206	4.115737	2.915656
HCAHeartST11702009_TTGTTTCACATCCAGG-1	0.892658	0.595588	0.840894	0.474780	0.756744	0.500931	0.773873	0.546936	0.911552	1.095047	...	3.193938	3.510115	3.897340	2.844283	3.585431	3.717606	3.070820	3.733671	4.232878	2.710185
HCAHeartST11702009_TTGTTTCATTAGTCTA-1	0.885468	0.584947	0.826959	0.487564	0.755719	0.489815	0.775463	0.542257	0.937204	1.073276	...	3.143954	3.479612	3.848481	2.857775	3.520583	3.666855	3.107396	3.691140	4.184914	2.785052
HCAHeartST11702009_TTGTTTCCATACAACT-1	0.887217	0.581479	0.830952	0.470430	0.749562	0.489309	0.763408	0.535830	0.911418	1.086841	...	3.169456	3.481936	3.877080	2.833387	3.561459	3.688138	3.077145	3.700656	4.210742	2.712910
HCAHeartST11702009_TTGTTTGTATTACACG-1	0.880530	0.587205	0.813241	0.511512	0.758651	0.492912	0.782527	0.551987	0.965851	1.043346	...	3.107597	3.463045	3.794714	2.887498	3.463999	3.630590	3.172147	3.669116	4.137131	2.924880

3982 rows × 300 columns

[6]:

ad.layers['original'] = ad.X
ad.layers['loki'] = predict_data

/var/folders/f1/0m_1r9dx73dff178jp2t41900000gp/T/ipykernel_85458/3605322198.py:1: ImplicitModificationWarning: Setting element `.layers['original']` of view, initializing view as actual.
  ad.layers['original'] = ad.X

[7]:

ad.X=ad.layers['original']
sc.pl.spatial(ad, img_key="hires", color='MYH7', size=1.5, vmax='p90', vmin='p10', title='Ground Truth MYH Expression')

../_images/notebooks_Loki_PredEx_case_study_9_0.png

[8]:

ad.X = ad.layers['loki']
sc.pl.spatial(ad, img_key="hires", color='MYH7', size=1.5, vmax='p90', vmin='p10', title='Loki Predicted MYH Expression')

../_images/notebooks_Loki_PredEx_case_study_10_0.png

[ ]: