MSc Thesis Defense: Ceren Yıldırım, PHYLOGENY-AWARE INFERENCE OF NUCLEOTIDE VARIANT TOLERANCE ACROSS THE GENOME, Date & Time: June 23, 2026 – 1:30 PM, Place: FENS L027
PHYLOGENY-AWARE INFERENCE OF NUCLEOTIDE VARIANT
TOLERANCE ACROSS THE GENOME
Ceren Yıldırım
Molecular Biology, Genetics, and Bioengineering, MSc Thesis, 2026
Thesis Jury
Assoc. Prof. Ogün Adebali (Thesis Advisor)
Assoc. Prof. Öznur Taştan
Prof. Dr. Uğur Özbek
Date & Time: June 23th, 2026 – 1.30 PM
Place: FENS L027
Zoom: https://sabanciuniv.zoom.us/j/
Keywords : phylogenetics, Mendelian diseases, pathogenicity scoring, single nucleotide variant effect prediction
Abstract
Accurate classification of single-nucleotide variants remains a challenge in genomics. We present PHACTn, a phylogeny-aware probabilistic method for scoring variant tolerability across the human genome. Using a 470-way mammalian alignment, PHACTn derives scores from an explicit model of nucleotide substitution histories across the phylogenetic tree, requiring no training data, no learned parameters, and no specialised hardware. Evaluated on a dataset comprising pathogenic variants from ClinVar and benign variants from gnomAD, PHACTn outperformed all classical conservation scores, including phyloP, phastCons, and GERP. In non-coding variant prediction, it outperformed all tools on a curated set of variants associated with Mendelian diseases. On the hard-case subset, comprising clinically ambiguous variants where established tools disagree, it ranked first in AUROC, F1, and MCC, suggesting that phylogenetic independence captures complementary information that existing methods largely miss. Despite using only four interpretable parameters, PHACTn remains competitive with or superior to large foundation models such as GPN-MSA and Evo2-7B across multiple variant categories, while requiring a fraction of the computational resources. Each prediction can be traced directly through the phylogenetic tree, offering a level of transparency that sequence-based deep learning models cannot provide. PHACTn thus offers a principled, accessible, and interpretable framework for variant effect prediction, with particular strength in non-coding and clinically ambiguous genomic regions.