# BulkRNAseq analyses This repository contains the key scripts used to generate the final tables and figures featured in the accompanying manuscript: **Zonari E, Naldini MM, Barcella M, Volpin M, et al**, _Ex Vivo Expansion of Hematopoietic Stem and Progenitor Cells from Human Mobilized Peripheral Blood for Gene Therapy Applications_, 2026. ## Abstract Ex vivo expansion of mobilized peripheral blood (mPB) hematopoietic stem cells (HSCs) represents a promising approach to advance cell and gene therapy strategies yet is hampered by loss of stem cell function when applying commonly used culture protocols. We performed in-depth characterization of mPB expansion cultures by single cell RNA sequencing, which highlighted differentiation trajectories with preservation of lineage fidelity in committed progenitors. Defining a putative HSC cluster allowed an estimation of transduction efficiency in ex vivo cultures, which correlated with long-term gene marking in xenografts and patients enrolled in a gene therapy study. We then developed a clinically translatable, GMP-compliant process to expand lentivirus (LV)-transduced HSCs from mPB of pediatric patients and adult donors, by biologically informed protocol improvements of cytokine supplementation, media choice, timing of LV transduction and combinations of small molecules preventing the activation of differentiation programs. Our optimized process outperforms validated state-of-the-art cord blood expansion protocols when applied to mPB. LV integration site analysis and genomic barcode-based clonal tracking provided definitive proof for symmetric HSC self-renewal divisions occurring during ex vivo culture. These results warrant clinical testing of this HSC transduction/expansion process in an upcoming clinical gene therapy trial for autosomal recessive osteopetrosis (EU CT 2024-518972-30). GEO: [GSE292605](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE292605). Below a brief description with Bulk and scRNAseq workflow adopted in this work. **Bulk RNAseq** analysis was performed using a standard pipeline that includes the follwing steps: 1. Quality control by [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) 2. Trimming of bad quality reads with [TrimGalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)

Running command

trim_galore --quality 20 --fastqc --length 25 --output_dir {outdir} --paired {input.r1} {inout.r2}

3. Alignment with [STAR](https://github.com/alexdobin/STAR)

Running command

"STAR " + "--runThreadN {threads} " + "--genomeDir {input.genome} " + "--readFilesIn {params.trim_seq} " + "--outSAMstrandField intronMotif " + "--outFileNamePrefix {params.aln_seq_prefix} " + "--outSAMtype BAM SortedByCoordinate " + "--outSAMmultNmax 1 " + "--outFilterMismatchNmax 10 " + "--outReadsUnmapped Fastx " + "--readFilesCommand zcat "

4. Gene expression quantification with [FeatureCounts](https://academic.oup.com/bioinformatics/article/30/7/923/232889)

Running command

"featureCounts " + "-a {input.annot} " + "-o {output.fcount} " + "-g gene_name " + "-p -B -C " + "-s {params.strand} " + "--minOverlap 10 " + "-T {threads} " + "{input.bams} "

5. DGE analysis with [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html): We employed a data analysis workflow that relied on edgeR DEGs identification setting custom biological variation coefficients (BCV) to 0.1-0.3 for the different 1vs1 comparisons performed (72h vs 24h – in each batch separately). Counts were normalized using TMM method and differential test was performed with exactTest function provided by the edgeR package (v3.32). Genes with an adjusted p.value (Benjiamini Hochberg FDR method) < 0.05 were considered differentially expressed. Details are provided in the script: [DGE.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_bulkrnaseq/-/blob/main/DGE/DGE.R) 6. Dowstream functional Analysis with [ClusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html). In order to retrieve functional annotation from DE analysis, we performed **O**ver **R**epresentation **A**nalysis and **G**ene **S**et **E**nrichment **A**nalysis by using the functions provided by the package. **ORA** analysis was performed in particular on Gene Ontology Terms (Biological Process sub-group) retrieved by the function. **GSEA** analysis was performed against MSigDB gene sets C1 to C6 + Hallmarks and Senescence signatures (*.gmt files) Details are provided in the script: [Fig5F_processing_and_plot.R](http://www.bioinfotiget.it/gitlab/custom/zonari_mpbhscexp_2025/zonari_mpbhscexp_2025_bulkrnaseq/-/blob/main/Fig5F_processing_and_plot.R)