08 Interoperability - NICHES.Rmd
In the previous vignettes, NICHES was mainly demonstrated with Seurat framework. Here in this vignette, we demonstrate the interoperability of NICHES with other popular scRNA-seq frameworks, namely, Scanpy, Monocle3, and scater.
First, let’s start with Scanpy and use the pbmc3k dataset as a demonstration. Scanpy is python-based and uses AnnData as the primary data strcture. We will demonstrate how to extract the data from Scanpy to run NICHES, convert the NICHES output to AnnData, and use functions from Scanpy to perform downstream analysis.
Here besides NICHES, we will also load ALRA for imputation and reticulate for running python.
library(NICHES)
# Load ALRA
devtools::source_url('https://raw.githubusercontent.com/KlugerLab/ALRA/master/alra.R')
# Set up reticulate and the python env
library(reticulate)
# change to your corresponding python path
Sys.setenv(RETICULATE_PYTHON = "/Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10")
We will also need a few python packages, in particular, scanpy.
Now let’s load the pbmc3k data from scanpy directly. Here we load both the raw and processed pbmc3k. The former will be used to run NICHES after a few processing steps and we will use the latter to add cell annotations.
# load the raw pbmc3k and the processed pbmc3k
adata = sc.datasets.pbmc3k()
#> WARNING: Failed to open the url with default certificates, trying with certifi.
#>
0%| | 0.00/5.58M [00:00<?, ?B/s]
1%| | 32.0k/5.58M [00:00<00:28, 203kB/s]
2%|1 | 88.0k/5.58M [00:00<00:17, 329kB/s]
3%|3 | 192k/5.58M [00:00<00:20, 278kB/s]
7%|6 | 384k/5.58M [00:00<00:09, 547kB/s]
8%|8 | 480k/5.58M [00:00<00:08, 618kB/s]
11%|# | 624k/5.58M [00:01<00:07, 681kB/s]
12%|#2 | 688k/5.58M [00:01<00:07, 669kB/s]
16%|#5 | 896k/5.58M [00:01<00:05, 874kB/s]
21%|## | 1.17M/5.58M [00:01<00:03, 1.35MB/s]
23%|##3 | 1.29M/5.58M [00:01<00:03, 1.30MB/s]
25%|##5 | 1.42M/5.58M [00:01<00:03, 1.25MB/s]
28%|##8 | 1.57M/5.58M [00:01<00:03, 1.24MB/s]
31%|### | 1.71M/5.58M [00:02<00:03, 1.21MB/s]
33%|###3 | 1.86M/5.58M [00:02<00:03, 1.21MB/s]
36%|###5 | 2.01M/5.58M [00:02<00:03, 1.21MB/s]
39%|###8 | 2.16M/5.58M [00:02<00:02, 1.23MB/s]
41%|####1 | 2.31M/5.58M [00:02<00:02, 1.23MB/s]
44%|####4 | 2.48M/5.58M [00:02<00:02, 1.25MB/s]
47%|####7 | 2.62M/5.58M [00:02<00:02, 1.24MB/s]
50%|####9 | 2.77M/5.58M [00:02<00:02, 1.30MB/s]
51%|#####1 | 2.88M/5.58M [00:03<00:02, 1.22MB/s]
53%|#####3 | 2.96M/5.58M [00:03<00:02, 1.10MB/s]
56%|#####6 | 3.13M/5.58M [00:03<00:02, 1.15MB/s]
58%|#####7 | 3.23M/5.58M [00:03<00:02, 1.10MB/s]
60%|###### | 3.35M/5.58M [00:03<00:02, 990kB/s]
63%|######3 | 3.52M/5.58M [00:03<00:01, 1.11MB/s]
66%|######5 | 3.68M/5.58M [00:03<00:02, 896kB/s]
69%|######8 | 3.85M/5.58M [00:04<00:01, 998kB/s]
71%|####### | 3.95M/5.58M [00:04<00:01, 880kB/s]
71%|#######1 | 3.99M/5.58M [00:04<00:02, 780kB/s]
75%|#######5 | 4.21M/5.58M [00:04<00:01, 1.07MB/s]
77%|#######7 | 4.32M/5.58M [00:04<00:01, 1.03MB/s]
80%|#######9 | 4.45M/5.58M [00:05<00:01, 618kB/s]
82%|########1 | 4.57M/5.58M [00:05<00:01, 678kB/s]
84%|########4 | 4.70M/5.58M [00:05<00:01, 710kB/s]
84%|########4 | 4.71M/5.58M [00:05<00:01, 606kB/s]
88%|########8 | 4.92M/5.58M [00:05<00:00, 890kB/s]
90%|########9 | 5.02M/5.58M [00:05<00:01, 565kB/s]
92%|#########1| 5.13M/5.58M [00:06<00:00, 624kB/s]
94%|#########3| 5.23M/5.58M [00:06<00:00, 676kB/s]
96%|#########5| 5.34M/5.58M [00:06<00:00, 715kB/s]
98%|#########7| 5.45M/5.58M [00:06<00:00, 750kB/s]
100%|#########9| 5.56M/5.58M [00:06<00:00, 761kB/s]
100%|##########| 5.58M/5.58M [00:06<00:00, 886kB/s]
adata2 = sc.datasets.pbmc3k_processed()
#> WARNING: Failed to open the url with default certificates, trying with certifi.
#>
0%| | 0.00/23.5M [00:00<?, ?B/s]
1%| | 136k/23.5M [00:00<00:18, 1.34MB/s]
2%|1 | 368k/23.5M [00:00<00:13, 1.80MB/s]
3%|2 | 640k/23.5M [00:00<00:11, 2.14MB/s]
4%|4 | 968k/23.5M [00:00<00:09, 2.57MB/s]
5%|5 | 1.24M/23.5M [00:00<00:09, 2.43MB/s]
7%|6 | 1.57M/23.5M [00:00<00:08, 2.71MB/s]
8%|8 | 1.90M/23.5M [00:00<00:07, 2.88MB/s]
10%|9 | 2.24M/23.5M [00:00<00:07, 3.07MB/s]
11%|# | 2.48M/23.5M [00:00<00:07, 2.88MB/s]
12%|#2 | 2.88M/23.5M [00:01<00:06, 3.25MB/s]
14%|#3 | 3.26M/23.5M [00:01<00:06, 3.44MB/s]
15%|#5 | 3.55M/23.5M [00:01<00:06, 3.34MB/s]
16%|#6 | 3.87M/23.5M [00:01<00:06, 3.28MB/s]
18%|#7 | 4.21M/23.5M [00:01<00:06, 3.33MB/s]
19%|#9 | 4.57M/23.5M [00:01<00:05, 3.43MB/s]
21%|## | 4.87M/23.5M [00:01<00:05, 3.33MB/s]
22%|##2 | 5.21M/23.5M [00:01<00:05, 3.41MB/s]
23%|##3 | 5.49M/23.5M [00:01<00:05, 3.25MB/s]
25%|##4 | 5.85M/23.5M [00:02<00:05, 3.33MB/s]
26%|##6 | 6.12M/23.5M [00:02<00:05, 3.14MB/s]
28%|##7 | 6.48M/23.5M [00:02<00:05, 3.30MB/s]
29%|##8 | 6.77M/23.5M [00:02<00:05, 3.15MB/s]
30%|### | 7.09M/23.5M [00:02<00:05, 3.17MB/s]
31%|###1 | 7.37M/23.5M [00:02<00:05, 3.04MB/s]
33%|###2 | 7.71M/23.5M [00:02<00:05, 3.19MB/s]
34%|###4 | 8.01M/23.5M [00:02<00:05, 3.16MB/s]
36%|###5 | 8.38M/23.5M [00:02<00:04, 3.38MB/s]
37%|###7 | 8.73M/23.5M [00:02<00:04, 3.41MB/s]
39%|###8 | 9.07M/23.5M [00:03<00:04, 3.46MB/s]
40%|#### | 9.41M/23.5M [00:03<00:04, 3.49MB/s]
42%|####1 | 9.77M/23.5M [00:03<00:04, 3.53MB/s]
43%|####2 | 10.1M/23.5M [00:03<00:04, 3.49MB/s]
44%|####4 | 10.4M/23.5M [00:03<00:04, 3.23MB/s]
45%|####4 | 10.6M/23.5M [00:03<00:04, 2.77MB/s]
46%|####5 | 10.8M/23.5M [00:03<00:05, 2.57MB/s]
47%|####6 | 11.0M/23.5M [00:03<00:05, 2.52MB/s]
48%|####7 | 11.2M/23.5M [00:03<00:05, 2.27MB/s]
48%|####8 | 11.4M/23.5M [00:04<00:06, 2.09MB/s]
49%|####9 | 11.6M/23.5M [00:04<00:05, 2.12MB/s]
50%|##### | 11.8M/23.5M [00:04<00:06, 1.98MB/s]
51%|#####1 | 12.1M/23.5M [00:04<00:05, 2.28MB/s]
53%|#####2 | 12.4M/23.5M [00:04<00:04, 2.61MB/s]
54%|#####4 | 12.8M/23.5M [00:04<00:03, 2.93MB/s]
56%|#####5 | 13.1M/23.5M [00:04<00:03, 3.08MB/s]
57%|#####7 | 13.4M/23.5M [00:04<00:03, 3.16MB/s]
59%|#####8 | 13.8M/23.5M [00:04<00:03, 3.31MB/s]
60%|###### | 14.1M/23.5M [00:04<00:02, 3.39MB/s]
62%|######1 | 14.5M/23.5M [00:05<00:02, 3.45MB/s]
63%|######2 | 14.8M/23.5M [00:05<00:02, 3.28MB/s]
64%|######4 | 15.1M/23.5M [00:05<00:02, 3.34MB/s]
65%|######5 | 15.4M/23.5M [00:05<00:02, 3.22MB/s]
67%|######6 | 15.7M/23.5M [00:05<00:02, 2.92MB/s]
69%|######8 | 16.1M/23.5M [00:05<00:02, 3.38MB/s]
70%|######9 | 16.4M/23.5M [00:05<00:02, 3.30MB/s]
71%|#######1 | 16.7M/23.5M [00:05<00:02, 3.09MB/s]
72%|#######2 | 17.0M/23.5M [00:05<00:02, 3.04MB/s]
74%|#######3 | 17.3M/23.5M [00:05<00:02, 3.16MB/s]
75%|#######5 | 17.7M/23.5M [00:06<00:01, 3.16MB/s]
76%|#######6 | 17.9M/23.5M [00:06<00:01, 2.97MB/s]
77%|#######7 | 18.2M/23.5M [00:06<00:01, 2.83MB/s]
79%|#######8 | 18.5M/23.5M [00:06<00:01, 3.08MB/s]
80%|######## | 18.9M/23.5M [00:06<00:01, 3.22MB/s]
81%|########1 | 19.1M/23.5M [00:06<00:01, 3.12MB/s]
83%|########2 | 19.5M/23.5M [00:06<00:01, 3.21MB/s]
84%|########4 | 19.8M/23.5M [00:06<00:01, 3.06MB/s]
85%|########5 | 20.0M/23.5M [00:06<00:01, 2.89MB/s]
86%|########6 | 20.2M/23.5M [00:07<00:01, 2.62MB/s]
87%|########6 | 20.4M/23.5M [00:07<00:01, 2.26MB/s]
88%|########7 | 20.6M/23.5M [00:07<00:01, 2.24MB/s]
88%|########8 | 20.8M/23.5M [00:07<00:01, 2.08MB/s]
89%|########9 | 20.9M/23.5M [00:07<00:01, 1.92MB/s]
90%|######### | 21.2M/23.5M [00:07<00:01, 2.01MB/s]
91%|#########1| 21.5M/23.5M [00:07<00:00, 2.47MB/s]
93%|#########2| 21.8M/23.5M [00:07<00:00, 2.74MB/s]
94%|#########4| 22.2M/23.5M [00:07<00:00, 2.97MB/s]
96%|#########5| 22.5M/23.5M [00:07<00:00, 3.10MB/s]
97%|#########7| 22.9M/23.5M [00:08<00:00, 3.25MB/s]
99%|#########8| 23.2M/23.5M [00:08<00:00, 3.33MB/s]
100%|#########9| 23.5M/23.5M [00:08<00:00, 3.12MB/s]
100%|##########| 23.5M/23.5M [00:08<00:00, 2.97MB/s]
# Filtering
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
adata = adata[adata.obs.n_genes_by_counts < 2500, :]
adata = adata[adata.obs.pct_counts_mt < 5, :]
# Log1p normalization
sc.pp.normalize_total(adata, target_sum=1e4)
#> /Users/mbr29/Library/Python/3.10/lib/python/site-packages/scanpy/preprocessing/_normalization.py:170: UserWarning: Received a view of an AnnData. Making a copy.
#> view_to_actual(adata)
sc.pp.log1p(adata)
# Assign labels
adata.obs['louvain'] = adata2.obs['louvain']
Here for demonstration, For let’s focus in on signaling within a simple 3-cell system:
Here we will extract the cell types metadata from anndata and impute the data to run NICHES. In particular, data.imputed will the data matrix to run NICHES.
meta.data.df <- data.frame("louvain" = py$adata$obs$louvain,row.names = py$cell_names)
# alra imputation
imputed.list <- alra(as.matrix(py$adata$X$toarray()))
#> Read matrix with 1966 cells and 13714 genes
#> Chose k=16
#> Getting nonzeros
#> Randomized SVD
#> Find the 0.001000 quantile of each gene
#> Sweep
#> Scaling all except for 393 columns
#> 0.00% of the values became negative in the scaling process and were set to zero
#> The matrix went from 5.89% nonzero to 40.75% nonzero
data.imputed <- t(imputed.list[[3]])
colnames(data.imputed) <- py$cell_names
rownames(data.imputed) <- py$gene_names
Here we will take the previously imputed data matrix as input to run NICHES and specify output_format to be “raw” for easier conversion to AnnData.
We will extract the output CellToCellMatrix from NICHES and convert it to AnnData.
adata_niches = sc.AnnData(X = r.niches_list['CellToCell']['CellToCellMatrix'].T)
#> <string>:1: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata_niches.obs = r.niches_list['CellToCell']['metadata']
adata_niches.var_names = r.lr_names
We will use the Scanpy functions to perform downstream analysis. Here we will scale the data, do PCA and UMAP.
We can use the visualization function from Scanpy to visualize the data in UMAP.
We can also use the function from Scanpy to perform differential analysis between different VectorTypes. Here we will rank the genes using a logistic regression classifier with L1 regularization and visualize the top markers.
sc.tl.rank_genes_groups(adata_niches, 'VectorType', method='logreg', solver='liblinear',penalty='l1')
with plt.rc_context({"figure.figsize": (8, 12), "figure.dpi": (300)}):
sc.pl.rank_genes_groups(adata_niches, n_genes=5, sharey=False,show=True,fontsize=20)
We can also check the interaction levels on violin plot for certain markers.
# check specific LRs
with plt.rc_context({"figure.figsize": (10,7), "figure.dpi": (300),"font.size":(8)}):
sc.pl.violin(adata_niches, marker_LRs[0], groupby='VectorType',rotation=20)
We can also visualize the interaction levels on top of UMAP.
with plt.rc_context({"figure.figsize": (8, 8), "figure.dpi": (300)}):
sc.pl.umap(adata_niches, color=marker_LRs[:2],ncols=2)
We can also visualize the interaction levels for the top markers of different groups using dot plot.
Here we will continue to use this pbmc3k dataset, but we will switch to Monocle3 (CellDataSet). This will be a simple demonstration of how to convert the output from NICHES to CellDataSet and use certain functions from Monocle3 to perform downstream analysis.
Here we use the same output from NICHES from previous run and convert it to CellDataSet.
library(monocle3)
lr_metadata <- data.frame(lr_names)
rownames(lr_metadata) <- lr_names
cds <- new_cell_data_set(niches_list$CellToCell$CellToCellMatrix,
cell_metadata = niches_list$CellToCell$metadata,
gene_metadata = lr_metadata)
Then we will process the data using Monocle3 and run PCA and UMAP on it.
# use all genes for pca
cds <- preprocess_cds(cds, num_dim = 30,
method = "PCA",norm_method = "none",scaling = TRUE,verbose = TRUE)
cds <- reduce_dimension(cds)
We can visualize the data on UMAP (Cells are colored by VectorTypes).
plot_cells(cds, color_cells_by="VectorType")
Here we will continue to use this pbmc3k dataset, but we will switch to Scater (SingleCellExperiment) and utilize certain functions to perform downstream analysis.
sce <- runPCA(sce,exprs_values="data")
plotExpression(sce,features = "GNAI2—C5AR1",exprs_values = "data",x = "VectorType",colour_by="VectorType")+
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))