Commit a8c40052 authored by Marco Monti's avatar Marco Monti
Browse files

I added all the scripts

parent d0c12bfe
# Cantore_LiverDynamics2023_snRNAseq_RNAseq_vizgen
# Cantore_LiverDynamics2023_snRNAseq_RNAseq_vizgen
**Spatiotemporal liver dynamics shapes hepatocellular heterogeneity and impacts _in vivo_ gene engineering**
[Michela Milani](https://orcid.org/0000-0002-0363-678X), [Francesco Starinieri](https://orcid.org/0000-0003-0502-4442), [Marco Monti](https://orcid.org/0000-0003-1266-4325), [Stefano Beretta](https://orcid.org/0000-0003-4375-004X), [Ivan Merelli](https://orcid.org/0000-0003-3587-3680), [Alessio Cantore](https://orcid.org/0000-0002-9741-997X), _et al._, Journal of Hepatology, 2025 <https://doi.org/XXX>
Corresponding Author: Alessio Cantore. Email: [cantore.alessio@hsr.it](mailto:cantore.alessio@hsr.it)
Raw data are on GEO:
[GSE236425](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE236425) (Visium mouse liver 2-day-old, 2-week-old, 8-week-old & snRNAseq mouse liver 2-day-old)
[GSE296287](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE296287) (snRNAseq mouse liver 2-week-old, 8-week-old & RNAseq 2-day-old _in vitro_ hepatocytes with or without cytokines)
[GSE274042](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE274042) (MERFISH mouse liver 2-day-old)
Entire GitLab repository: <http://www.bioinfotiget.it/gitlab/custom/cantore_liverdynamics2023>
## Directories and Files
- environment_singlecell4.yml: contains the conda virtual environment that can be used to install all the dependencies. (Seurat v4.1.1)
- environment_singlecell5.yml: contains the conda virtual environment that can be used to install all the dependencies. (Seurat v5.0.1)
- scripts: folder with R scripts used for the analyses
- seurat_init.R: seurat analysis pipeline part 1;
- seurat_final.R: seurat analysis pipeline part 2;
- 01_seurat_sub_hepatocytes.R: manual annotation and subset of the hepatocytes;
- 02_proliferating_hepatocytes.R: Isolation of the proliferating hepatocytes and differential gene expression analysis;
- pseudotime_monocle3.R: Pseudotime analysis using Monocle3;
- post-analysis-singlecell-fromFile.R: GSEA analysis;
- paper_plots_snRNAseq.R: script to generate all the figures containing snRNAseq data;
- paper_plots_RNAseq.R: script to generate all the figures containing RNAseq data;
- paper_plots_vizgen.R: script to generate all the figures containing Vizgen-MERFISH data;
- useful_functions.R: contains functions for UMAP, featureplots, Violin plots;
- volcano_function.R: contains the function for the volcano plot for RNAseq data;
- data: results of the analyses
- plots: all the plots generated by these scripts
- tables
- full_UMAP_and_metadata.csv.gz
- hepatocytes_UMAP_and_metadata.csv.gz
- diff_expr_hepatocytes_logfc.thr_0_H.prol-H.not.prol.tsv
- endothelial_res0.1_Age_percentage_cells_FigS8M.csv
- RNAseq_DGE_hepatocytes_all48H_vs_crl48H_Deseq2.xlsx
- RNAseq_DGE_hepatocytes_all48H_vs_crl48H_Deseq2_top100_logFC_intersected_snRNAseq.xlsx
- reference
- regev_lab_cell_cycle_mouse.rds: contains the genes involved in the cell cycle in mouse
- Halpern_Layer_markers_full.xlsx: contains the hepatocytes zonation genes described in _Halpern et al._
- GSEA: reference files for GSEA analysis
## Analysis
The initial preprocessing of the data, including mapping against the _Mus musculus_ GRCm39 reference genome and gene counting, was done using the CeleSCOPE software (v1.16.2) (https://singleron.bio/products/celescope/) with the chemistry scopeV3.0.1 (kit V2) and setting the expected number of cells to 60k (--expected_cell_num 60000). The resulting data were imported into R and analyzed with the Seurat package (v4.1.1).
Here below is the code we used for the CeleSCOPE (v1.16.2) analysis.
```sh
celescope rna sample --outdir .//Mouse_liver_1/00.sample --sample Mouse_liver_1 --thread 8 --chemistry auto --fq1 Mouse_liver_1_1.fastq.gz
celescope rna prep_map --outdir .//Mouse_liver_1/01.prep_map --sample Mouse_liver_1 --thread 8 --chemistry auto --lowNum 2 --minimum_length 20 --nextseq_trim 20 --overlap 10 --genomeDir reference --outFilterMatchNmin 50 --outFilterMultimapNmax 1 --starMem 30 --fq1 Mouse_liver_1_1.fastq.gz --fq2 Mouse_liver_1_2.fastq.gz
celescope rna featureCounts --outdir .//Mouse_liver_1/02.featureCounts --sample Mouse_liver_1 --thread 8 --gtf_type gene --genomeDir reference --input .//Mouse_liver_1/01.prep_map/Mouse_liver_1_Aligned.out.bam
celescope rna count --outdir .//Mouse_liver_1/03.count --sample Mouse_liver_1 --thread 8 --genomeDir reference --expected_cell_num 60000 --cell_calling_method EmptyDrops_CR --count_detail .//Mouse_liver_1/02.featureCounts/Mouse_liver_1_count_detail.txt --force_cell_num None
celescope rna analysis --outdir .//Mouse_liver_1/04.analysis --sample Mouse_liver_1 --thread 8 --genomeDir reference --matrix_file .//Mouse_liver_1/03.count/Mouse_liver_1_filtered_feature_bc_matrix
```
All the 30,216 cells recovered were processed to remove cells with a low sequencing quality, those with a feature count below 200 or above 6,000, as well as cells with a fraction of mitochondrial genes higher than 1%, resulting in the final number of cells of 20,781. To account for technical variations, UMI counts were normalized and scaled using Seurat's NormalizeData function (normalization method: "LogNormalize") and ScaleData function. This process regressed out unwanted sources of variation, including the number of detected transcripts per cell, the percentage of transcripts originating from mitochondria, and the difference in scores between the cell cycle phases S and G2/M for each cell. Subsequently, data from the two technical replicates was integrated using Harmony, and dimensionality reduction was performed using principal component analysis (PCA) with 50 principal components (PCs). The top 20 PCs were then employed to generate the UMAP embeddings. Additionally, cell clusters were identified relying on these top 20 PCs with a resolution parameter of 0.6.
Here below is the code to launch our seurat pipeline on the full seurat object "MouseLiverALL2.rds" or on the subset of hepatocytes "hepatocytes.rds" using the parameters that were chosen.
MouseLiverALL2 (2-day-old)
```sh
Rscript seurat_init.R MouseLiverALL2.rds outdir MouseLiverALL2 200 6000 1 mt- 50 reference/regev_lab_cell_cycle_mouse.rds none 0 0 20
Rscript seurat_final.R MouseLiverALL2_init.rds outdir MouseLiverALL2 20 0.6 mouse 0.05 0 orig.ident 0
```
hepatocytes (2-day-old)
```sh
Rscript seurat_init.R hepatocytes.rds outdir hepatocytes 200 6000 1 mt- 25 reference/regev_lab_cell_cycle_mouse.rds none 0 0 20
Rscript seurat_final.R hepatocytes_init.rds outdir hepatocytes 13 0.6 mouse 0.05 0 orig.ident 0
```
Regarding the analysis of the snRNAseq across the three ages 2-day-old, 2-week-old, 8-week-old. The only differences were the software versions used for CeleScope v2.0.7 and Seurat v5.0.1. Additionally, during quality control, cells with a mitochondrial gene fraction exceeding 5% were excluded.
Here below is the code we used for the CeleSCOPE (v2.0.7) analysis.
```sh
celescope rna sample --outdir .//Mouse_liver_2wks/00.sample --sample Mouse_liver_2wks --thread 8 --chemistry auto --wells 384 --fq1 /beegfs/scratch/ric.cantore/ric.cantore/runs/EN00007973_snRNAseq_Milani/EN00007973_hdd1/1-2wks_1.fastq.gz
celescope rna starsolo --outdir .//Mouse_liver_2wks/01.starsolo --sample Mouse_liver_2wks --thread 8 --chemistry auto --adapter_3p AAAAAAAAAAAA --genomeDir /beegfs/scratch/ric.cantore/ric.cantore/EN00007973_snRNAseq_Milani/reference --outFilterMatchNmin 50 --soloCellFilter "EmptyDrops_CR 60000 0.99 10 45000 90000 500 0.01 20000 0.001 10000" --starMem 32 --soloFeatures "Gene GeneFull_Ex50pAS" --fq1 /beegfs/scratch/ric.cantore/ric.cantore/runs/EN00007973_snRNAseq_Milani/EN00007973_hdd1/1-2wks_1.fastq.gz --fq2 /beegfs/scratch/ric.cantore/ric.cantore/runs/EN00007973_snRNAseq_Milani/EN00007973_hdd1/1-2wks_2.fastq.gz
celescope rna analysis --outdir .//Mouse_liver_2wks/02.analysis --sample Mouse_liver_2wks --thread 8 --genomeDir /beegfs/scratch/ric.cantore/ric.cantore/EN00007973_snRNAseq_Milani/reference --matrix_file .//Mouse_liver_2wks/outs/filtered
```
Here below is the code to run our Seurat pipeline on either the full Seurat object or specific subsets (hepatocytes at 2-week-old, 8-week-old, or endothelial cells) using the selected parameters.
MouseLiverALL (2-day-old, 2-week-old, 8-week-old)
```sh
Rscript seurat_init_v5.R MouseLiverALL.rds outdir MouseLiverALL 200 6000 5 mt- 50 reference/regev_lab_cell_cycle_mouse.rds full 0 0 20
Rscript seurat_final_v5_NO_fastmnn.R MouseLiverALL_init.rds outdir MouseLiverALL 20 0.6 mouse 0.05 0 orig.ident 0
```
hepatocytes_2wks
```sh
Rscript seurat_init_v5.R hepatocytes_2wks.rds outdir hepatocytes_2wks 200 6000 5 mt- 25 reference/regev_lab_cell_cycle_mouse.rds none 0 0 20
Rscript seurat_final_v5_NO_fastmnn.R hepatocytes_2wks_init.rds outdir hepatocytes_2wks 10 0.2 mouse 0.05 0 orig.ident 0
```
hepatocytes_8wks
```sh
Rscript seurat_init_v5.R hepatocytes_8wks.rds outdir hepatocytes_8wks 200 6000 5 mt- 25 reference/regev_lab_cell_cycle_mouse.rds none 0 0 20
Rscript seurat_final_v5_NO_fastmnn.R hepatocytes_8wks_init.rds outdir hepatocytes_8wks 7 0.2 mouse 0.05 0 orig.ident 0
```
Endothelial (2-day-old, 2-week-old, 8-week-old, 2,523 nuclei)
```sh
Rscript seurat_init_v5.R Endothelial.rds outdir Endothelial 200 6000 5 mt- 25 reference/regev_lab_cell_cycle_mouse.rds none 0 0 20
Rscript seurat_final_v5_NO_fastmnn.R Endothelial_init.rds outdir Endothelial 10 0.1 mouse 0.05 0 orig.ident 0
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment