Case Study 2: Mouse hippocampus development¶

This tutorial shows how cellDancer analyzes RNA velocity. The process includes (1) estimating RNA velocity, (2) deriving cell fates on embedding space, (3) estimating pseudotime, and (4) displaying predicted kinetics rates (transcription, splicing, and degradation rates), spliced mRNA abundance, and unspliced mRNA abundance on embedding space.

Below is the case study for the mouse hippocampal dentate gyrus neurogenesis. We follow the preprocessing methods of La Manno et al. to filter genes and cells. 18,140 cells with 2,159 genes were selected.

Import packages¶

To run the notebook locally, Installation could be referred to install the environment and dependencies.

[1]:

# import packages
import os
import sys
import glob
import pandas as pd
import math
import matplotlib.pyplot as plt
import celldancer as cd
import celldancer.cdplt as cdplt
from celldancer.cdplt import colormap

Load data¶

The input data for cellDancer contains the preprocessed abundances of unspliced RNA and spliced RNA. The data of mouse hippocampal dentate gyrus neurogenesis can be ownloaded from DentateGyrus_cell_type_u_s.csv.zip. To load your own data, the dataframe should contain columns ‘gene_name’, ‘unsplice’, ‘splice’ ,‘cellID’ ,‘clusters’ ,‘embedding1’, and ‘embedding2.’ For a detailed description about the preprocessing and the data structure, refer to Data Preparation.

[2]:

cell_type_u_s_path="your_path/DentateGyrus_cell_type_u_s.csv"
cell_type_u_s=pd.read_csv(cell_type_u_s_path)
cell_type_u_s

[2]:

	gene_name	unsplice	splice	cellID	clusters	embedding1	embedding2
0	Rgs20	0.069478	0.021971	10X83_2:AAACGGGGTCTCTTTAx	ImmGranule2	18.931086	-1.862429
1	Rgs20	0.085834	0.016256	10X83_2:AACCATGGTTCAACCAx	ImmGranule2	18.419891	-1.282895
2	Rgs20	0.068644	0.047774	10X83_2:AACACGTTCTGAAAGAx	CA2-3-4	2.369887	16.868419
3	Rgs20	0.045387	0.018101	10X83_2:AAAGATGCATTGAGCTx	CA	-5.351040	10.676485
4	Rgs20	0.040457	0.012846	10X83_2:AACCATGTCTACTTACx	CA1-Sub	-6.189126	11.754900
...	...	...	...	...	...	...	...
39164255	Gpm6b	0.876650	1.276089	10X84_3:TTTCCTCCACCATCCTx	ImmGranule1	10.812611	-2.487668
39164256	Gpm6b	2.024897	5.152006	10X84_3:TTTGTCACATGAAGTAx	CA2-3-4	8.246204	23.482788
39164257	Gpm6b	1.848051	1.491445	10X84_3:TTTCCTCCACGGTAAGx	nIPC	-3.441272	-4.917364
39164258	Gpm6b	0.696361	1.189091	10X84_3:TTTGTCAAGCGTCAAGx	ImmGranule2	16.394199	-6.143549
39164259	Gpm6b	0.739348	1.239052	10X84_3:TTTCCTCGTGAAAGAGx	ImmGranule2	17.490857	-4.130190

39164260 rows × 7 columns

Estimate RNA velocity for sample genes¶

We use cd.velocity() to estimate the velocity. Here, 30 genes in gene_list are estimated as an example. The predicted unspliced and spliced reads, alpha, beta, and gamma are added to the dataframe.

[3]:

gene_list=['Psd3', 'Dcx', 'Syt11', 'Ntrk2', 'Gnao1', 'Gria1', 'Dctn3', 'Map1b', 'Camk2a', 'Gpm6b', 'Sez6l', 'Evl', 'Astn1', 'Ank2', 'Klf7', 'Tbc1d16', 'Atp1a3', 'Stxbp6', 'Scn2a1', 'Lhx9', 'Slc4a4', 'Ppfia2', 'Kcnip1', 'Ptpro', 'Diaph3', 'Slc1a3', 'Cadm1', 'Mef2c', 'Sptbn1', 'Ncald']

loss_df, cellDancer_df=cd.velocity(cell_type_u_s,
                                   gene_list=gene_list,
                                   permutation_ratio=0.1,
                                   norm_u_s=False,
                                   norm_cell_distribution=False,
                                   n_jobs=8)
cellDancer_df

Using /Users/shengyuli/Library/CloudStorage/OneDrive-HoustonMethodist/work/Velocity/bin/cellDancer_polish/analysis/CaseStudyNotebook/cellDancer_velocity_2022-06-27 16-37-44 as the output path.
Arranging genes for parallel job.
30  genes were arranged to  4  portions.


Velocity Estimation:   0%|          | 0/4 [00:00<?, ?it/s]
Velocity Estimation:  25%|██▌       | 1/4 [00:37<01:53, 37.68s/it]
Velocity Estimation:  50%|█████     | 2/4 [01:04<01:03, 31.53s/it]
Velocity Estimation:  75%|███████▌  | 3/4 [01:31<00:29, 29.08s/it]
Velocity Estimation: 100%|██████████| 4/4 [01:52<00:00, 26.11s/it]

[3]:

	cellIndex	gene_name	unsplice	splice	unsplice_predict	splice_predict	alpha	beta	gamma	loss	cellID	clusters	embedding1	embedding2
0	0	Klf7	0.408467	1.294797	0.444935	1.475828	0.454905	0.935128	0.015373	0.077489	10X83_2:AAACGGGGTCTCTTTAx	ImmGranule2	18.931086	-1.862429
1	1	Klf7	0.379136	1.256870	0.411796	1.424216	0.419835	0.935061	0.015772	0.077489	10X83_2:AACCATGGTTCAACCAx	ImmGranule2	18.419891	-1.282895
2	2	Klf7	0.893599	3.395004	0.969404	3.832591	1.033540	0.986942	0.001990	0.077489	10X83_2:AACACGTTCTGAAAGAx	CA2-3-4	2.369887	16.868419
3	3	Klf7	0.640505	2.739187	0.669036	3.047821	0.684626	0.979797	0.003759	0.077489	10X83_2:AAAGATGCATTGAGCTx	CA	-5.351040	10.676485
4	4	Klf7	0.662303	2.433943	0.712970	2.749427	0.745024	0.971894	0.005226	0.077489	10X83_2:AACCATGTCTACTTACx	CA1-Sub	-6.189126	11.754900
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
544195	18135	Kcnip1	0.018745	0.005679	0.243844	0.008779	0.463909	0.731405	1.322285	0.079982	10X84_3:TTTCCTCCACCATCCTx	ImmGranule1	10.812611	-2.487668
544196	18136	Kcnip1	0.148039	0.093618	0.380534	0.085685	0.572677	0.727422	1.319759	0.079982	10X84_3:TTTGTCACATGAAGTAx	CA2-3-4	8.246204	23.482788
544197	18137	Kcnip1	0.080708	0.032079	0.312729	0.040230	0.522807	0.728119	1.323665	0.079982	10X84_3:TTTCCTCCACGGTAAGx	nIPC	-3.441272	-4.917364
544198	18138	Kcnip1	0.078976	0.033418	0.310343	0.040067	0.520257	0.728360	1.323348	0.079982	10X84_3:TTTGTCAAGCGTCAAGx	ImmGranule2	16.394199	-6.143549
544199	18139	Kcnip1	0.050686	0.019074	0.279029	0.024949	0.493671	0.729688	1.322998	0.079982	10X84_3:TTTCCTCGTGAAAGAGx	ImmGranule2	17.490857	-4.130190

544200 rows × 14 columns

Visualize the phase portraits¶

We visualize the phase portrait of each gene with cdplt.scatter_gene().

[4]:

ncols=5
height=math.ceil(len(gene_list)/ncols)*4
fig = plt.figure(figsize=(20,height))

for i in range(len(gene_list)):
    ax = fig.add_subplot(math.ceil(len(gene_list)/ncols), ncols, i+1)
    cdplt.scatter_gene(
        ax=ax,
        x='splice',
        y='unsplice',
        cellDancer_df=cellDancer_df,
        custom_xlim=None,
        custom_ylim=None,
        colors=colormap.colormap_neuro,
        alpha=0.5,
        s = 10,
        velocity=True,
        gene=gene_list[i])

    ax.set_title(gene_list[i])
    ax.axis('off')

plt.show()

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_13_0.png

Project the RNA velocity onto the embedding space¶

To project the prediction of RNA velocity to velocity onto the embedding space and to estimate pseudotime by using all genes, predicted result can be downloaded from DentateGyrus_cellDancer_estimation.csv.zip.

[5]:

# load the prediction result of all genes
cellDancer_df_file = 'your_path/DentateGyrus_cellDancer_estimation.csv'
cellDancer_df=pd.read_csv(cellDancer_df_file)

We calculate the projection of RNA velocity on the embedding with cd.compute_cell_velocity(). The projected direction on embedding space, i.e. columns ‘velocity1’ and ‘velocity2’ are added to the original dataframe. We use cdplt.scatter_cell() to display the predicted direction on embedding space.

[6]:

# compute cell velocity
cellDancer_df=cd.compute_cell_velocity(cellDancer_df=cellDancer_df)

# plot cell velocity
fig, ax = plt.subplots(figsize=(15,15))
im = cdplt.scatter_cell(ax,cellDancer_df,
                        colors=colormap.colormap_neuro,
                        alpha=0.3,
                        s=10,
                        velocity=True,
                        legend='on',
                        min_mass=2,
                        arrow_grid=(30,30))
ax.axis('off')
plt.show()

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_18_0.png

Estimate pseudotime¶

Based on the projection of RNA velocity on embedding space, we estimate the pseudotime with cd.pseudo_time().

[7]:

%%capture

# set parameters
dt = 0.001
t_total = {dt: 10000}
n_repeats = 10

# estimate pseudotime
cellDancer_df = cd.pseudo_time(cellDancer_df=cellDancer_df,
                                          grid=(30, 30),
                                          dt=dt,
                                          t_total=t_total[dt],
                                          n_repeats=n_repeats,
                                          speed_up=(60,60),
                                          n_paths = 5,
                                          psrng_seeds_diffusion=[i for i in range(n_repeats)],
                                          n_jobs=8)

[8]:

# plot pseudotime
fig, ax = plt.subplots(figsize=(10,10))
im=cdplt.scatter_cell(ax,cellDancer_df, colors='pseudotime', alpha=0.5, velocity=False)
ax.axis('off')
plt.show()

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_22_0.png

Display the abundance of spliced RNA along pseudotime¶

We visualize the spliced RNA abundance of some sample genes along pseudotime with cdplt.scatter_gene().

[9]:

gene_list=['Psd3', 'Gria1', 'Camk2a', 'Sez6l', 'Ank2', 'Atp1a3', 'Stxbp6', 'Scn2a1', 'Ppfia2', 'Kcnip1', 'Ntrk2', 'Mef2c', 'Sptbn1', 'Ncald','Dcx', 'Syt11','Slc1a3', 'Dctn3', 'Map1b', 'Gpm6b', 'Evl', 'Astn1', 'Tbc1d16','Slc4a4', 'Ptpro']

ncols=5
height=math.ceil(len(gene_list)/ncols)*4
fig = plt.figure(figsize=(20,height))

for i in range(len(gene_list)):
    ax = fig.add_subplot(math.ceil(len(gene_list)/ncols), ncols, i+1)
    cdplt.scatter_gene(
        ax=ax,
        x='pseudotime',
        y='splice',
        cellDancer_df=cellDancer_df,
        custom_xlim=None,
        custom_ylim=None,
        colors=colormap.colormap_neuro,
        alpha=0.5,
        s = 5,
        velocity=False,
        gene=gene_list[i])

    ax.set_title(gene_list[i])
    ax.axis('off')

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_25_0.png

Visualize the reaction rates, the abundance of unspliced RNA, and the abundance of spliced RNA on embedding space¶

cellDancer provides other functions for visualization. we show the predicted kinetics rates (transcription, splicing, and degradation rates), spliced mRNA abundance, and unspliced mRNA abundance on embedding space with cdplt.scatter_cell().

[10]:

gene_samples=['Ntrk2','Psd3','Dcx']

for gene in gene_samples:
    fig, ax = plt.subplots(ncols=5, figsize=(15,3))
    cdplt.scatter_cell(ax[0],cellDancer_df, colors='alpha',
                 gene=gene, velocity=False)
    cdplt.scatter_cell(ax[1],cellDancer_df, colors='beta',
                 gene=gene, velocity=False)
    cdplt.scatter_cell(ax[2],cellDancer_df, colors='gamma',
                 gene=gene, velocity=False)
    cdplt.scatter_cell(ax[3],cellDancer_df, colors='splice',
                 gene=gene, velocity=False)
    cdplt.scatter_cell(ax[4],cellDancer_df, colors='unsplice',
                 gene=gene, velocity=False)
    ax[0].axis('off')
    ax[1].axis('off')
    ax[2].axis('off')
    ax[3].axis('off')
    ax[4].axis('off')
    ax[0].set_title('alpha-'+gene)
    ax[1].set_title('beta-'+gene)
    ax[2].set_title('gamma-'+gene)
    ax[3].set_title('spliced-'+gene)
    ax[4].set_title('unspliced-'+gene)
    plt.tight_layout()
    plt.show()

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_28_0.png

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_28_1.png

https://raw.githack.com/GuangyuWangLab2021/cellDancer_website/main/docs/_images/notebooks_case_study_neuro_28_2.png