# Single cell analyses This repository contains the key scripts used to generate the final tables and figures featured in the accompanying manuscript: **Zonari E, Naldini MM, Barcella M, Volpin M, et al**, _Ex Vivo Expansion of Hematopoietic Stem and Progenitor Cells from Human Mobilized Peripheral Blood for Gene Therapy Applications_, 2026. ## Abstract Ex vivo expansion of mobilized peripheral blood (mPB) hematopoietic stem cells (HSCs) represents a promising approach to advance cell and gene therapy strategies yet is hampered by loss of stem cell function when applying commonly used culture protocols. We performed in-depth characterization of mPB expansion cultures by single cell RNA sequencing, which highlighted differentiation trajectories with preservation of lineage fidelity in committed progenitors. Defining a putative HSC cluster allowed an estimation of transduction efficiency in ex vivo cultures, which correlated with long-term gene marking in xenografts and patients enrolled in a gene therapy study. We then developed a clinically translatable, GMP-compliant process to expand lentivirus (LV)-transduced HSCs from mPB of pediatric patients and adult donors, by biologically informed protocol improvements of cytokine supplementation, media choice, timing of LV transduction and combinations of small molecules preventing the activation of differentiation programs. Our optimized process outperforms validated state-of-the-art cord blood expansion protocols when applied to mPB. LV integration site analysis and genomic barcode-based clonal tracking provided definitive proof for symmetric HSC self-renewal divisions occurring during ex vivo culture. These results warrant clinical testing of this HSC transduction/expansion process in an upcoming clinical gene therapy trial for autosomal recessive osteopetrosis (EU CT 2024-518972-30). PubMed: DOI: GEO: [GSE292604](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE292604) and [GSE292606](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE292606) for Fig. 2H and Fig. 2C datasets respectively. ## Data analysis workflow ### Dataset Fig2C ![Fig2C](https://www.dropbox.com/scl/fi/lzb8u6oj4bvd0xf6977aw/Dataset2C.png?rlkey=t0g2g067pr7ukcdludv3rgh8j&dl=1) Figures 2C,D,E,F : [Figures_2CDEF.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Figures_2CDEF.R). Figure 2G: [Figures_2G.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Figures_2G.R). #### Basic analysis The **scRNAseq basic** analysis was conducted according to the workflow outlined below: 1. QC: min.cells = 20; min_feature: between 200 and 6000; max pct.mito = 20. Normalization (default seurat settings) Scaling (with following variables to regress out: percent.mt + nCount_RNA and CC.Difference calculated as show in [vignette](https://satijalab.org/seurat/articles/cell_cycle_vignette.html#alternate-workflow-1)) 3. Dimensionality reduction: PCA (top 25PCs using 15% most variable genes of whole gene list) 4. Harmony batch removal (sample/ orig.ident) 5. Clustering (Louvain improved: algorithm number 2 in FindCluster Seurat function) 6. Markers identification (resolution 0.6 + additional subclustering of clusters 5,7 and 9 with FindSubCluster Seurat function) 7. Automatic cell type annotation using singleR (v1.8.1) and Sakurai et al (23) as reference dataset. Custom annotation of cells was then performed according to Sakurai classifcation and markers inspection providing two different annotations levels: Classification variable (more refined) and Population variable - less granular as shown in Fig2C. The complete list of markers produced by FindAllMarkers seurat function for all resolutions and annotation varibles (logfc.threshold = 0, min.pct = 0.2) is included in Data file S2. see: [FindAllMarkers_Population_dataset_Fig2C.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/FindAllMarkers_Population_dataset_Fig2C.R) and result in .rds format: [Full_GSEA_markers_Population.rds](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Full_GSEA_markers_Population.rds) #### Evaluation of expansion culture on mPB CD34+ cells To assess the effect of expansion culture on mPB CD34+ cells we identified genes significantly deregulated across culture timepoints starting from uncultured (day0) cells to day4 and day8 expansion timepoints. To correct for differences in cell population abundances, we performed all comparisons across timepoints within each cell population (Classification label). First, we identified differentially expressed genes between timepoints (day4 vs day0; day8 vs day4 and day8 vs day0) by the FindMarker function from Seurat R package. According to fold change direction we defined different patterns of modulation that could be simplified in up or down regulated across culture (consistently up/down, early or late up/down according to day4 and day8 significance and direction). We next looked for a more comprehensive set of genes which could summarize a global culture effect across most populations, and selected sets of common up and down regulated genes shared in at least 4 populations out of 7 identified by our Population label. Lists of common up/down regulated genes across culture were used as input for ORA using clusterprofiler R package (v4.7.1) Code details and input for Supplementary Figure 2 can be found in: - [Supplemental_fig2_part1.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Supplemental_fig2_part1.R) - [Supplemental_fig2_part2.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Supplemental_fig2_part2.R) - [Supplemental_fig2_input.rds](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Supplemental_fig2_input.rds) ### Dataset Fig2H ![Fig2H](https://www.dropbox.com/scl/fi/h3jhl7q67vw74nfh72epx/UMAP_dataset2H.png?rlkey=jkznuqwbs5fqty8wkprbclheq&dl=1) Figure 2H: [scGate_Fig2C_dataset_to_Fig2H_dataset.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/scGate_Fig2C_dataset_to_Fig2H_dataset.R). Figure 2I: [scGate_Fig2C_dataset_to_Fig2I_dataset.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/scGate_Fig2C_dataset_to_Fig2I_dataset.R) #### Basic analysis The **scRNAseq basic** analysis was conducted according to the workflow outlined below: 1. QC: min.cells = 20; min_feature: between 200 and 6000; max pct.mito = 20. Normalization (default seurat settings) 2. ALRA (Imputation with default parameters in RunALRA function) 2. Scaling (with following variables to regress out: percent.mt + nCount_RNA and CC.Difference calculated as show in [vignette](https://satijalab.org/seurat/articles/cell_cycle_vignette.html#alternate-workflow-1)) 3. Dimensionality reduction: PCA (top 25PCs using 15% most variable genes of whole gene list) 4. Harmony batch removal (timepoint (tp variable) + patient(pt variable)) 5. Clustering (Louvain improved: algorithm number 2 in FindCluster Seurat function) 6. Markers identification (resolution 0.6 + RefinedAnnotation variable post scGATE annotation with refinement - see scripts below) #### scGate annotation using Fig2C as reference dataset Cell classification was performed by employing scGATE (v1.6.2) R package (61) using the Fig.2C datasetas reference. In particular we build a gating model by using the top 15 positive markers for each population (level 1). For details see [markersdir_scGate_input](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/tree/main/markersdir_scGate_input) folder. The workflow consists of two steps: - building of the model and the raw annotation using scGATE - refinement of multiple annotation labels into final label according to custom rules as shown in the code. All details can be found in the script [scGate_Fig2C_dataset_to_Fig2H_dataset.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/scGate_Fig2C_dataset_to_Fig2H_dataset.R). #### WPRE quantification and mapping To evaluate the transduction efficiency of Fig2H patients samples, we assessed for expression of WPRE (LV specific gene) transcripts as surrogate of the IDUA LV-derived transgene, which itself could not be discriminated by IDUA transcripts originating from the native gene. Alignment and quantification of WPRE transcripts was ensured by adding the WPRE sequence to the reference human genome in cellranger (10X Genomics). Transduction efficiency (TE) was hence calculated as the fraction of cells bearing the WPRE transcripts within each sample and/or cell population. Code for performing quantification and mapping can be found in: - [WPRE_quantification.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/WPRE_quantification.R) - [Fig2I.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_scrnaseq/-/blob/main/Fig2I.R)