# Bonini_TIGIT2024_WES

Spiga M, Potenza A, Beretta S et al.
**Disrupting TIGIT by cytosine base editing enables effective and safe T-cell therapy for pancreatic cancer.**
2024.
- ENA [PRJEB78986]

---

### Analyses ###

Input data were analyzed following the GATK "Best Practice Workflows" to identify variants in each sample.
* Pre-processing:
  - FastQC (v0.11.9) to check the quality of the sequencing reads
  - Trim-galore (v0.6.6) to trim low-quality bases
  - Seqtk toolkit (v1.3) to randomly downsample abundant samples to 190M reads to avoid sample unbalance.
* Alignment:
  - BWA (v0.7.17) to align reads to the human genome assembly (GRCh38)
* Variant Calling (GATK):
  - Picard (v2.25.6) MarkDuplicates to mark duplicates
  - BaseRecalibrator + ApplyBQSR to recalibrate base quality scores on dbSNP known sites
  - HaplotypeCaller to call variants in each sample by emitting condensed non-variant blocks (i.e., -ERC GVCF)
  - CombineGVCFs and GenotypeGVCFs to combine and genotype variants
* Variant Filtering and Annotation:
  - VariantFiltration to filter results based on their ‘QualityByDepth’ (i.e., --filter-expression 'QD < 2.0') and overall coverage ‘DP’ (i.e., --filter-expression 'DP < 500')
  - Additional per-sample filters to identify the private variants, that is, those having low genotype quality (i.e., GQ < 80) and low coverage (i.e., DP < 50) were removed
  - Untreated sample of each donor was used as germline reference, and its variants were filtered out from the corresponding treated sample, as such variants were considered as already present in the initial cell population and not induced by the base editor
  - Multi-allelic variants (mainly involving repetitive sequences) were removed
  - SnpEff (v5.0) to annotate the resulting variants on the canonical isoform from the GRCh38.p13.RefSeq reference database
