Research
Publications and preprints
bioRxiv · 2025
Millions of missense variants are present in human genomes, yet the functional consequences of most remain unknown. We introduce Human Spatial Constraint (HSC), a framework for quantifying intraspecies constraint on missense variants that integrates population-scale human genetic variation with 3D protein structures. HSC models the expected frequency of missense variation under neutral evolution and compares it to observed variation, accounting for both variation in mutational processes and structural context. HSC outperforms traditional inter- and intraspecies conservation metrics, as well as unsupervised protein language models such as ESM1b, in predicting pathogenic variants, achieving performance comparable to AlphaMissense. Fine-tuning protein language models on HSC scores improves the prediction of variant fitness across diverse taxa and deep mutational scanning functional assay types.
Microbiology Resource Announcements · 2020
Twenty-seven Salmonella isolates were collected from four locations within an ecological wastewater treatment system located at The Pennsylvania State University and were subjected to whole-genome sequencing. The sequences obtained were used for in silico characterization, including serotyping and phylogenetic relatedness analysis.