publications
publications in reverse chronological order.
2026
- Observational epidemiological studies can mitigate genetic confounding with a genetic relatedness matrixRoshni A. Patel, Joshua G. Schraiber, Matt Pennell, and 1 more authorPNAS, May 2026
Observational studies are commonly used in psychology and epidemiology to identify risk factors correlated with health outcomes. However, these studies are vulnerable to confounding when shared genetic variation influences both the putative risk factor and outcome. Researchers have often controlled for this type of genetic confounding using polygenic scores, but these scores are noisy and biased estimators of a trait’s genetic component. While some newer methods offer significant improvements over polygenic scores, they still rely on genome-wide association studies (GWAS) summary statistics, which may be untenable for certain datasets. Here, we develop an analogous method that leverages a genetic relatedness matrix to control genetic confounding when testing for nongenetic risk factors. In simulations, we find that our method outperforms existing approaches, particularly at sample sizes that are large by the standards of much human research but smaller than datasets often used in human genetics. We also demonstrate that existing methods are susceptible to poor GWAS portability, whereas our method is inherently robust to such concerns, conditional on the availability of individual genotype data. Finally, we apply our method to the UK Biobank to reanalyze social risk factors for health outcomes in previously understudied cohorts.
- Analytical expectations for ancestry junction accumulation in admixed genomesShirin Nataneli, Aydin Loid Karatas, Tessa Ferrari, and 2 more authorsGenetics, Mar 2026
Complex demographic events have shaped human history, leaving signatures of genetic variation across the genome. Here, we investigate the recent evolutionary dynamics of admixed populations that descend from distinct ancestral sources. We present a discrete, generalizable model of admixture that leverages ancestry switches, which are recombination breakpoints that mark changes in ancestral origin along a chromosome. We derive analytical expectations for the number of ancestry switches within a genomic segment as functions of recombination rate, ancestry heterozygosity, and effective population size. We then extend these expectations to incorporate population-specific recombination maps. Our theoretical predictions are in close agreement with forward-in-time simulations that trace ancestry junction accumulation following an initial admixture event under both constant and variable recombination models. We observe minimal variability in switch counts across ten simulation replicates, underscoring the robustness of the theoretical expectation. Furthermore, model-based switch counts, parameterized using literature-informed demographic values, agree with empirical observations from African-American individuals in the 1000 Genomes Project. For example, when modeling human chromosome 1, we found a mean of approximately six switches per haplotype, which aligns with the theoretical expectation under an initial African ancestry proportion of 0.85 and agrees with published estimates from other African-American cohorts. Overall, the model provides a new route for using ancestry switches to understand how recombination and demography jointly shape ancestry patterns in admixed populations without requiring separation into parental sources.
2025
- Characterizing selection on complex traits through conditional frequency spectraRoshni A Patel, Clemens L Weiß, Huisheng Zhu, and 4 more authorsGenetics, Apr 2025
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of its frequency and effect size – but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insights into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
2024
- Increasing equity in science requires better ethics training: A course by trainees, for traineesRoshni A. Patel, Rachel A. Ungar, Alanna L. Pyke, and 13 more authorsCell Genomics, May 2024
Despite the profound impacts of scientific research, few scientists have received the necessary training to productively discuss the ethical and societal implications of their work. To address this critical gap, we—a group of predominantly human genetics trainees—developed a course on genetics, ethics, and society. We intend for this course to serve as a template for other institutions and scientific disciplines. Our curriculum positions human genetics within its historical and societal context and encourages students to evaluate how societal norms and structures impact the conduct of scientific research. We demonstrate the utility of this course via surveys of enrolled students and provide resources and strategies for others hoping to teach a similar course. We conclude by arguing that if we are to work toward rectifying the inequities and injustices produced by our field, we must first learn to view our own research as impacting and being impacted by society.
- Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learningRaeline Valbuena, AkshatKumar Nigam, Josh Tycko, and 9 more authorsbioRxiv, Sep 2024
Regulatory proteins have evolved diverse repressor domains (RDs) to enable precise context-specific repression of transcription. However, our understanding of how sequence variation impacts the functional activity of RDs is limited. To address this gap, we generated a high-throughput mutational scanning dataset measuring the repressor activity of 115,000 variant sequences spanning more than 50 RDs in human cells. We identified thousands of clinical variants with loss or gain of repressor function, including TWIST1 HLH variants associated with Saethre-Chotzen syndrome and MECP2 domain variants associated with Rett syndrome. We also leveraged these data to annotate short linear interacting motifs (SLiMs) that are critical for repression in disordered RDs. Then, we designed a deep learning model called TENet (Transcriptional Effector Network) that integrates sequence, structure and biochemical representations of sequence variants to accurately predict repressor activity. We systematically tested generalization within and across domains with varying homology using the mutational scanning dataset. Finally, we employed TENet within a directed evolution sequence editing framework to tune the activity of both structured and disordered RDs and experimentally test thousands of designs. Our work highlights critical considerations for future dataset design and model training strategies to improve functional variant prioritization and precision design of synthetic regulatory proteins.
2022
- Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traitsRoshni A. Patel, Shaila A. Musharoff, Jeffrey P. Spence, and 22 more authorsAm J Hum Genet, Jul 2022
Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.