publications
publications in reverse chronological order.
2025
- Characterizing selection on complex traits through conditional frequency spectraRoshni A Patel, Clemens L Weiß, Huisheng Zhu, and 4 more authorsGenetics, Apr 2025
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of its frequency and effect size – but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insights into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
- Observational epidemiological studies can mitigate genetic confounding with the genetic relatedness matrixRoshni A. Patel, Joshua G. Schraiber, Matt Pennell, and 1 more authorbioRxiv, Nov 2025
Observational studies are commonly used in psychology and epidemiology to identify risk factors correlated with health outcomes. However, these studies are vulnerable to confounding when shared genetic variation influences both the putative risk factor and outcome. Researchers have historically controlled for this type of genetic confounding using polygenic scores, but these scores are often noisy and biased estimators of a trait’s genetic component. Here, we develop a method that leverages solutions to a similar problem in the field of phylogenetics. Motivated by inference of causal effects in phylogenetics, we show that the genetic relationship matrix (GRM) can be used to control genetic confounding when testing for non-genetic risk factors. In simulations, we find that our method out-performs existing approaches, particularly in the sample sizes characteristic of datasets in psychology and epidemiology. We also demonstrate that while existing methods are susceptible to poor GWAS portability, our method is inherently robust to such concerns. Finally, we apply our method to the UK Biobank to re-analyze social risk factors for health outcomes in previously understudied cohorts.
- Analytical expectations for ancestry junction accumulation in admixed genomesShirin Nataneli, Aydin Loid Karatas, Roshni A. Patel, and 2 more authorsbioRxiv, Oct 2025
Complex demographic events have shaped human history, leaving signatures of genetic variation across the genome. Here, we recover the recent evolutionary history of admixed populations formed from multiple ancestral sources. We present a discrete, generalizable model of admixture that leverages ancestry switches, which are recombination breakpoints that mark changes in ancestral origin along a chromosome. We derive analytical expectations for the number of ancestry switches within a genomic segment as functions of recombination rate, ancestry heterozygosity, and effective population size, and extend these expectations to incorporate population-specific recombination maps. Forward-time simulations tracing ancestry junctions for ten generations after admixture show close agreement with theoretical predictions under constant and variable recombination models. We observe minimal variability in switch counts across ten replicates, underscoring the robustness of the theoretical expectation. Furthermore, model-based switch counts agree with empirical observations from African American individuals in the 1000 Genomes Project. For example, when modeling human chromosome 1, we found a mean of approximately six switches per haplotype, which aligns with the theoretical expectation under an initial African ancestry proportion of 0.85, and agrees with published estimates from other African-American cohorts. Overall, the model provides a new route for using ancestry switches to reconstruct how recombination and demography jointly shape ancestry patterns in admixed populations without requiring separation into parental sources.
2024
- Increasing equity in science requires better ethics training: A course by trainees, for traineesRoshni A. Patel, Rachel A. Ungar, Alanna L. Pyke, and 13 more authorsCell Genomics, May 2024
Despite the profound impacts of scientific research, few scientists have received the necessary training to productively discuss the ethical and societal implications of their work. To address this critical gap, we—a group of predominantly human genetics trainees—developed a course on genetics, ethics, and society. We intend for this course to serve as a template for other institutions and scientific disciplines. Our curriculum positions human genetics within its historical and societal context and encourages students to evaluate how societal norms and structures impact the conduct of scientific research. We demonstrate the utility of this course via surveys of enrolled students and provide resources and strategies for others hoping to teach a similar course. We conclude by arguing that if we are to work toward rectifying the inequities and injustices produced by our field, we must first learn to view our own research as impacting and being impacted by society.
- Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learningRaeline Valbuena, AkshatKumar Nigam, Josh Tycko, and 9 more authorsbioRxiv, Sep 2024
Regulatory proteins have evolved diverse repressor domains (RDs) to enable precise context-specific repression of transcription. However, our understanding of how sequence variation impacts the functional activity of RDs is limited. To address this gap, we generated a high-throughput mutational scanning dataset measuring the repressor activity of 115,000 variant sequences spanning more than 50 RDs in human cells. We identified thousands of clinical variants with loss or gain of repressor function, including TWIST1 HLH variants associated with Saethre-Chotzen syndrome and MECP2 domain variants associated with Rett syndrome. We also leveraged these data to annotate short linear interacting motifs (SLiMs) that are critical for repression in disordered RDs. Then, we designed a deep learning model called TENet (Transcriptional Effector Network) that integrates sequence, structure and biochemical representations of sequence variants to accurately predict repressor activity. We systematically tested generalization within and across domains with varying homology using the mutational scanning dataset. Finally, we employed TENet within a directed evolution sequence editing framework to tune the activity of both structured and disordered RDs and experimentally test thousands of designs. Our work highlights critical considerations for future dataset design and model training strategies to improve functional variant prioritization and precision design of synthetic regulatory proteins.
2022
- Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traitsRoshni A. Patel, Shaila A. Musharoff, Jeffrey P. Spence, and 22 more authorsAm J Hum Genet, Jul 2022
Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.