**Bulk RNAseq** data analysis workflow includes the following steps: 1. Quality control by [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) 2. Trimming of bad quality reads with [TrimGalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) 3. Alignment with [STAR](https://github.com/alexdobin/STAR)
Running command "STAR " + "--runThreadN {threads} " + "--genomeDir {input.genome} " + "--readFilesIn {params.trim_seq} " + "--outSAMstrandField intronMotif " + "--outFileNamePrefix {params.aln_seq_prefix} " + "--outSAMtype BAM SortedByCoordinate " + "--outSAMmultNmax 1 " + "--outFilterMismatchNmax 10 " + "--outReadsUnmapped Fastx " + "--readFilesCommand zcat "
4. Gene expression quantification with [FeatureCounts](https://academic.oup.com/bioinformatics/article/30/7/923/232889)
Running command "featureCounts " + "-a {input.annot} " + "-o {output.fcount} " + "-g gene_name " + "-p -B -C " + "-s {params.strand} " + "--minOverlap 10 " + "-T {threads} " + "{input.bams} "
5. Differential Expression analysis with [Deseq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and [edgeR](https://doi.org/10.1093/bioinformatics/btp616). For Thal vs HD comparison intra population (HSC or MPP - n = 2) we used the simple design with counts ~ condition, whereas for assessing differencies between Thal and HD in HSC and MPP togheter we included the celltype covariate : count ~ celltype + condition. Genes with an FDR < 0.05 or < 0.001 were considered DEGs for intra population comparisons and combined (HSC + MPP) with covariate respectively. 6. Downstream analysis We performed Over Representation Analyses (ORA) and Gene Set Enrichment Analyses (GSEA) using R/Bioconductor package [clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) (v.3.8.1). Pre-ranked gene list according to log2FC was used to perform Gene Set Enrichment Analysis (GSEA). Terms with adjusted p-value (Benjiamini-Hochber correction) less than 0.05 were considered significantly enriched.