Dentate Gyrus

The dataset used in here for velocity analysis is from the dentate gyrus, a part of the hippocampus which is involved in learning, episodic memory formation and spatial coding.

It is measured using 10X Genomics Chromium and described in Hochgerner et al. (2018). The data consists of 25,919 genes across 2,930 cells forming multiple lineages.

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
%matplotlib inline

import scvelo as scv
scv.logging.print_version()
Running scvelo 0.1.24.dev28+gcdb5a7a (python 3.7.3) on 2019-10-28 16:04.
In [2]:
scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.set_figure_params('scvelo')  # for beautified visualization

Load and cleanup the data

The following analysis is based on the in-built dentate gyrus dataset.

To run velocity analysis on your own data, read your file (loom, h5ad, xlsx, csv, tab, txt …) to an AnnData object with adata = scv.read('path/file.loom', cache=True).

If you want to merge your loom file into an already existing AnnData object, use scv.utils.merge(adata, adata_loom).

In [3]:
adata = scv.datasets.dentategyrus()
In [4]:
# show proportions of spliced/unspliced abundances
scv.utils.show_proportions(adata)
adata
Abundance of ['spliced', 'unspliced', 'ambiguous']: [0.69 0.08 0.23]
Out[4]:
AnnData object with n_obs × n_vars = 2930 × 13913
    obs: 'clusters', 'age(days)', 'clusters_enlarged'
    uns: 'clusters_colors'
    obsm: 'X_umap'
    layers: 'ambiguous', 'spliced', 'unspliced'

If you have a very large datasets, you can save memory by clearing attributes not required via scv.utils.cleanup(adata).

Preprocess the data

Preprocessing that is necessary consists of : - gene selection by detection (detected with a minimum number of counts) and high variability (dispersion). - normalizing every cell by its initial size and logarithmizing X.

Filtering and normalization is applied in the same vein to spliced/unspliced counts and X. Logarithmizing is only applied to X. If X is already preprocessed from former analysis, it won’t touch it.

All of this is summarized in a single function pp.filter_and_normalize, which basically runs the following:

scv.pp.filter_genes(adata, min_shared_counts=10)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=3000)
scv.pp.log1p(adata)

Further, we need the first and second order moments (basically mean and uncentered variance) computed among nearest neighbors in PCA space. First order is needed for deterministic velocity estimation, while stochastic estimation also requires second order moments.

In [5]:
scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

Filtered out 11019 genes that are detected in less than 30 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:04) --> added to `.uns['neighbors']`
    'distances', weighted adjacency matrix
    'connectivities', weighted adjacency matrix
computing moments based on connectivities
    finished (0:00:00) --> added
    'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)

Compute velocity and velocity graph

The gene-specific velocities are obtained by fitting a ratio between precursor (unspliced) and mature (spliced) mRNA abundances that well explains the steady states (constant transcriptional state) and then computing how the observed abundances deviate from what is expected in steady state. (We will soon release a version that does not rely on the steady state assumption anymore).

Every tool has its plotting counterpart. The results from scv.tl.velocity, for instance, can be visualized using scv.pl.velocity.

In [6]:
scv.tl.velocity(adata)
computing velocities
    finished (0:00:00) --> added
    'velocity', velocity vectors for each individual cell (adata.layers)

This computes the (cosine) correlation of potential cell transitions with the velocity vector in high dimensional space. The resulting velocity graph has dimension \(n_{obs} \times n_{obs}\) and summarizes the possible cell state changes (given by a transition from one cell to another) that are well explained through the velocity vectors. If you set approx=True it is computed on a reduced PCA space with 50 components.

The velocity graph can then be converted to a transition matrix by applying a Gaussian kernel on the cosine correlation which assigns high probabilities to cell state changes that correlate with the velocity vector. You can access the Markov transition matrix via scv.tl.transition_matrix. The resulting transition matrix can be used for a variety of applications shown hereinafter. For instance, it is used to place the velocities into a low-dimensional embedding by simply applying the mean transition with respect to the transition probabilities, i.e. scv.tl.velocity_embedding. Further, we can trace cells back along the Markov chain to their origins and potential fates, thus obtaining root cells and end points within a trajectory; via scv.tl.terminal_states.

In [22]:
scv.tl.velocity_graph(adata)
computing velocity graph
    finished (0:00:05) --> added
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)

Plot results

Finally, the velocities are projected onto any embedding specified in basis and visualized in one of three available ways: on single cell level, on grid level, or as streamplot as shown here.

In [23]:
scv.pl.velocity_embedding_stream(adata, basis='umap', color=['clusters', 'age(days)'])

computing velocity embedding
    finished (0:00:00) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
_images/DentateGyrus_16_1.png
In [9]:
scv.pl.velocity_embedding(adata, basis='umap', arrow_length=1.2, arrow_size=1.2, dpi=150)

_images/DentateGyrus_17_0.png
In [10]:
scv.pl.velocity_embedding_grid(adata, color='Tmsb10',
                               layer=['velocity', 'spliced'], arrow_size=1.5)
_images/DentateGyrus_18_0.png
In [11]:
scv.tl.rank_velocity_genes(adata, match_with='clusters', resolution=.4)

import pandas as pd
pd.DataFrame(adata.uns['rank_velocity_genes']['names']).head()
computing velocity clusters
    finished (0:00:00) --> added
    'velocity_clusters', clusters based on modularity on velocity field (adata.obs)
ranking velocity genes
    finished (0:00:00) --> added
    'rank_velocity_genes', sorted scores by group ids (adata.uns)
Out[11]:
Granule mature 0 Granule immature 1 Granule immature 2 Neuroblast 3 Astrocytes 4 Neuroblast 5 Endothelial 6 Microglia 7 Mossy 8 GABA 9 OPC 10 OL 11 Cajal Retzius 12 Cck-Tox 13
0 Grasp Ppp3ca Jph1 Bzw2 Ctnnd2 Tenm2 Serpine2 Srgap2 Chgb Lancl1 Tnr Clmn Utrn Golga7b
1 2010300C02Rik Shisa9 Pgbd5 Gdpd1 Lsamp Mt3 Arhgap31 Clic4 Pgm2l1 Stmn2 Hmgcs1 Gprc5b Scg3 Irf9
2 Wasf1 Mtus2 Adarb2 Mt3 Qk Nrp1 Tmsb10 Qk Fxyd1 Vsnl1 Luzp2 Arrdc3 Tmem47 Atp1b3
3 Rtn4rl1 Epha4 Sphkap Dpysl3 Myl6 Tmsb10 Igf1r Ssh2 Mapk6 Mtus2 Elavl3 Pcdh9 Dpysl3 Cplx2
4 Jph1 Pkia Pip5k1b Slc38a2 2810459M11Rik Spock2 Prex2 Sirt2 Gria2 Elavl3 Ppp1r14c Gatm Sh3glb1 Stmn2
In [14]:
var_names = ['Tmsb10', 'Camk2a', 'Ppp3ca', 'Igfbpl1']
scv.pl.velocity(adata, var_names=var_names, colorbar=True, ncols=2)
_images/DentateGyrus_20_0.png
In [15]:
scv.pl.velocity_graph(adata)
_images/DentateGyrus_21_0.png