# Example: Dentate Gyrus¶

The dataset used in here for velocity analysis is from the dentate gyrus, a part of the hippocampus which is involved in learning, episodic memory formation and spatial coding.

It is measured using 10X Genomics Chromium and described in Hochgerner et al. (2018). The data consists of 25,919 genes across 2,930 cells forming multiple lineages.

```
In [1]:
```

```
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
%matplotlib inline
import scvelo as scv
scv.logging.print_version()
```

```
Running scvelo 0.1.18 (python 3.7.3) on 2019-06-11 11:35.
```

```
In [2]:
```

```
scv.settings.verbosity = 3 # show errors(0), warnings(1), info(2), hints(3)
scv.settings.set_figure_params('scvelo') # for beautified visualization
```

## Load and cleanup the data¶

The following analysis is based on the in-built dentate gyrus dataset.

To run velocity analysis on your own data, read your file (loom, h5ad,
xlsx, csv, tab, txt …) to an AnnData object with
`adata = scv.read('path/file.loom', cache=True)`

.

If you want to merge your loom file into an already existing AnnData
object, use `scv.utils.merge(adata, adata_loom)`

.

```
In [3]:
```

```
adata = scv.datasets.dentategyrus()
```

```
In [4]:
```

```
# show proportions of spliced/unspliced abundances
scv.utils.show_proportions(adata)
adata
```

```
Abundance of ['spliced', 'unspliced', 'ambiguous']: [0.69 0.08 0.23]
```

```
Out[4]:
```

```
AnnData object with n_obs × n_vars = 2930 × 13913
obs: 'clusters', 'age(days)', 'clusters_enlarged'
obsm: 'X_umap'
layers: 'ambiguous', 'spliced', 'unspliced'
```

If you have a very large datasets, you can save memory by clearing
attributes not required via `scv.utils.cleanup(adata)`

.

## Preprocess the data¶

Preprocessing that is necessary consists of : - gene selection by
**detection** (detected with a minimum number of counts) and **high
variability** (dispersion). - **normalizing** every cell by its initial
size and **logarithmizing** X.

Filtering and normalization is applied in the same vein to spliced/unspliced counts and X. Logarithmizing is only applied to X. If X is already preprocessed from former analysis, it won’t touch it.

All of this is summarized in a single function
`pp.filter_and_normalize`

, which basically runs the following:

```
scv.pp.filter_genes(adata, min_shared_counts=10)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=3000)
scv.pp.log1p(adata)
```

Further, we need the first and second order moments (basically mean and uncentered variance) computed among nearest neighbors in PCA space. First order is needed for deterministic velocity estimation, while stochastic estimation also requires second order moments.

```
In [5]:
```

```
scv.pp.filter_and_normalize(adata, min_shared_counts=10, n_top_genes=3000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
```

```
Filtered out 9174 genes that are detected in less than 10 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
finished (0:00:03.94) --> added to `.uns['neighbors']`
'distances', weighted adjacency matrix
'connectivities', weighted adjacency matrix
computing moments based on connectivities
finished (0:00:00.47) --> added
'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
```

## Compute velocity and velocity graph¶

The gene-specific velocities are obtained by fitting a ratio between precursor (unspliced) and mature (spliced) mRNA abundances that well explains the steady states (constant transcriptional state) and then computing how the observed abundances deviate from what is expected in steady state. (We will soon release a version that does not rely on the steady state assumption anymore).

Every tool has its plotting counterpart. The results from
`scv.tl.velocity`

, for instance, can be visualized using
`scv.pl.velocity`

.

```
In [6]:
```

```
scv.tl.velocity(adata)
```

```
computing velocities
finished (0:00:00.44) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
```

This computes the (cosine) correlation of potential cell transitions
with the velocity vector in high dimensional space. The resulting
velocity graph has dimension \(n_{obs} \times n_{obs}\) and
summarizes the possible cell state changes (given by a transition from
one cell to another) that are well explained through the velocity
vectors. If you set `approx=True`

it is computed on a reduced PCA
space with 50 components.

The velocity graph can then be converted to a transition matrix by
applying a Gaussian kernel on the cosine correlation which assigns high
probabilities to cell state changes that correlate with the velocity
vector. You can access the Markov transition matrix via
`scv.tl.transition_matrix`

. The resulting transition matrix can be
used for a variety of applications shown hereinafter. For instance, it
is used to place the velocities into a low-dimensional embedding by
simply applying the mean transition with respect to the transition
probabilities, i.e. `scv.tl.velocity_embedding`

. Further, we can trace
cells back along the Markov chain to their origins and potential fates,
thus obtaining root cells and end points within a trajectory; via
`scv.tl.terminal_states`

.

```
In [7]:
```

```
scv.tl.velocity_graph(adata)
```

```
computing velocity graph
finished (0:00:04.28) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
```

## Plot results¶

Finally, the velocities are projected onto any embedding specified in
`basis`

and visualized in one of three available ways: on single cell
level, on grid level, or as streamplot as shown here.

```
In [8]:
```

```
scv.pl.velocity_embedding_stream(adata, basis='umap', color=['clusters', 'age(days)'])
```

```
computing velocity embedding
finished (0:00:00.59) --> added
'velocity_umap', embedded velocity vectors (adata.obsm)
```

```
<Figure size 700x500 with 0 Axes>
```

```
In [9]:
```

```
scv.pl.velocity_embedding(adata, basis='umap', dpi=150)
```

```
In [10]:
```

```
scv.pl.velocity_embedding_grid(adata, color='Tmsb10',
layer=['velocity', 'spliced'], arrow_size=1.5)
```

```
<Figure size 700x500 with 0 Axes>
```

```
In [11]:
```

```
scv.tl.rank_velocity_genes(adata, match_with='clusters', resolution=.4)
import pandas as pd
pd.DataFrame(adata.uns['rank_velocity_genes']['names']).head()
```

```
computing velocity clusters
finished (0:00:00.88) --> added
'velocity_clusters', clusters based on modularity on velocity field (adata.obs)
ranking velocity genes
finished (0:00:00.27) --> added
'rank_velocity_genes', sorted scores by group ids (adata.uns)
```

```
Out[11]:
```

Granule mature 0 | Neuroblast 1 | Granule immature 2 | Granule immature 3 | Astrocytes 4 | Mossy 5 | Endothelial 6 | Microglia 7 | OPC 8 | OL 9 | Cajal Retzius 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|

0 | 2010300C02Rik | Mt3 | Rpa3 | Ppp3ca | Myl6 | Slc35b2 | Serpine2 | Srgap2 | Hmgcs1 | Myl6 | Utrn |

1 | Slc17a7 | Luzp2 | Sept7 | Nell2 | Qk | Tubgcp2 | Ptprg | Clic4 | Tnr | Gatm | Tspan3 |

2 | Adarb2 | Slc38a2 | Fabp5 | Ptprd | Zbtb20 | Atp1b3 | Efr3b | Qk | Elavl3 | Arrdc3 | Tmem47 |

3 | Slc7a5 | Tenm2 | Shisa9 | Shisa9 | Ptchd4 | Fabp5 | Prex2 | Hn1 | Ppp1r14c | Sept7 | Dpysl3 |

4 | Gtf2h2 | Gria2 | Mpc2 | Arsb | Slc6a1 | Schip1 | Zdhhc20 | Reep5 | S100b | Clmn | Scg3 |

```
In [13]:
```

```
var_names = ['Tmsb10', 'Camk2a', 'Ppp3ca', 'Lsamp', 'Gabrb3', 'Igfbpl1']
scv.pl.velocity(adata, var_names=var_names, ncols=2, colorbar=True)
```

```
In [14]:
```

```
scv.pl.velocity_graph(adata)
```

```
In [15]:
```

```
lineage = ['nIPC', 'Neuroblast', 'Granule immature', 'Granule mature', 'Cck-Tox', 'Mossy']
adata.obs['lineages'] = ['granule_lineage' if c in lineage else c for c in adata.obs.clusters]
scv.tl.terminal_states(adata, groupby='lineages')
scv.tl.velocity_pseudotime(adata, groupby='lineages')
```

```
computing terminal states
identified 1 root cells and 1 end points (Astrocytes)
identified 1 root cells and 1 end points (Cajal Retzius)
identified 1 root cells and 1 end points (Endothelial)
identified 1 root cells and 1 end points (GABA)
identified 1 root cells and 1 end points (Microglia)
identified 1 root cells and 1 end points (OL)
identified 1 root cells and 1 end points (OPC)
identified 1 root cells and 1 end points (granule_lineage)
finished (0:00:00.68) --> added
'root_cells', root cells of Markov diffusion process (adata.obs)
'end_points', end points of Markov diffusion process (adata.obs)
```

```
In [16]:
```

```
scv.pl.scatter(adata, color=['velocity_pseudotime', 'root_cells', 'end_points'], size=60)
```