**Bulk RNAseq** data analysis workflow includes the following steps:
1. Quality control by [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
2. Trimming of bad quality reads with [TrimGalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
3. Alignment with [STAR](https://github.com/alexdobin/STAR)
Running command
"STAR " +
"--runThreadN {threads} " +
"--genomeDir {input.genome} " +
"--readFilesIn {params.trim_seq} " +
"--outSAMstrandField intronMotif " +
"--outFileNamePrefix {params.aln_seq_prefix} " +
"--outSAMtype BAM SortedByCoordinate " +
"--outSAMmultNmax 1 " +
"--outFilterMismatchNmax 10 " +
"--outReadsUnmapped Fastx " +
"--readFilesCommand zcat "
4. Gene expression quantification with [FeatureCounts](https://academic.oup.com/bioinformatics/article/30/7/923/232889)
Running command
"featureCounts " +
"-a {input.annot} " +
"-o {output.fcount} " +
"-g gene_name " +
"-p -B -C " +
"-s {params.strand} " +
"--minOverlap 10 " +
"-T {threads} " +
"{input.bams} "
5. Differential Expression analysis with [Deseq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and [edgeR](https://doi.org/10.1093/bioinformatics/btp616).
For Thal vs HD comparison intra population (HSC or MPP - n = 2) we used the simple design with counts ~ condition, whereas for assessing differencies between Thal and HD in HSC and MPP togheter we included the celltype covariate : count ~ celltype + condition. Genes with an FDR < 0.05 or < 0.001 were considered DEGs for intra population comparisons and combined (HSC + MPP) with covariate respectively.
6. Downstream analysis
We performed Over Representation Analyses (ORA) and Gene Set Enrichment Analyses (GSEA) using R/Bioconductor package [clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) (v.3.8.1). Pre-ranked gene list according to log2FC was used to perform Gene Set Enrichment Analysis (GSEA). Terms with adjusted p-value (Benjiamini-Hochber correction) less than 0.05 were considered significantly enriched.