Research

Publications and preprints

2025
Preprint Fine-tuning protein language models on human spatial constraint yields state-of-the-art variant effect prediction

Gyasu Bajracharya* and John A. Capra

bioRxiv · 2025

Millions of missense variants are present in human genomes, yet the functional consequences of most remain unknown. We introduce Human Spatial Constraint (HSC), a framework for quantifying intraspecies constraint on missense variants that integrates population-scale human genetic variation with 3D protein structures. HSC models the expected frequency of missense variation under neutral evolution and compares it to observed variation, accounting for both variation in mutational processes and structural context. HSC outperforms traditional inter- and intraspecies conservation metrics, as well as unsupervised protein language models such as ESM1b, in predicting pathogenic variants, achieving performance comparable to AlphaMissense. Fine-tuning protein language models on HSC scores improves the prediction of variant fitness across diverse taxa and deep mutational scanning functional assay types.

2020
Published Whole-Genome Sequences of Salmonella Isolates from an Ecological Wastewater Treatment System

Charles J. Connolly, Laura Kaminsky, Gabriella N. Pinto, Priscilla C. Sinclair, Gyasu Bajracharya, Runan Yan, Erin M. Nawrocki, Edward G. Dudley, and Jasna Kovac

Microbiology Resource Announcements · 2020

Twenty-seven Salmonella isolates were collected from four locations within an ecological wastewater treatment system located at The Pennsylvania State University and were subjected to whole-genome sequencing. The sequences obtained were used for in silico characterization, including serotyping and phylogenetic relatedness analysis.