How AlphaGenome Is Transforming the Prediction of Genetic Variant Effects

Understanding how genetic variants influence gene regulation remains a central challenge in human genomics, particularly for the vast majority of variants that lie outside protein-coding regions. Interpreting non-coding variants remains a major challenge, as more than 98% of human genetic variation occurs outside coding sequences and can influence gene regulation through chromatin state, transcription, splicing, and three-dimensional genome organization.

To address these limitations, AlphaGenome integrates long-range genomic context, base-pair resolution, and multimodal prediction within a single unified framework. The model processes 1 Mb of DNA sequence and predicts thousands of genome-wide functional tracks spanning resolutions from single–base pair to 2,048 bp.

AlphaGenome incorporates 11 data modalities, such as gene expression (RNA-seq, CAGE, PRO-cap), transcription initiation, chromatin accessibility (DNase-seq and ATAC-seq), histone modifications, transcription factor binding, chromatin contact maps (Hi-C and Micro-C), and detailed splicing outputs, which include splice sites, splice site usage, and splice junction coordinates and strength. In total, the model predicts 5,930 human tracks or 1,128 mouse tracks across diverse cell types and tissues.

Architecturally, AlphaGenome consists of a U-Net-style backbone composed of convolutional layers to extract local sequence features and transformer blocks to model long-range interactions, including enhancer-promoter communication. The model produces linear and spatial chromatin interaction features, as well as one-dimensional and two-dimensional embeddings at 1-bp, 128-bp, and 2,048-bp resolution.

Training on the full 1 Mb input at base-pair resolution is enabled by sequence parallelism across eight interconnected Tensor Processing Unit (v3) devices. The training proceeds in two phases. During pretraining, a 4-fold cross-validation scheme is used to train fold-specific models, and three-quarters of the reference genome is used in the training phase. The remaining quarter is used in the evaluation phase. In the second stage, distillation, a single student model is trained to reproduce the predictions of a set of all-fold teacher models on augmented input sequences. This distilled model can quantify variant effects across all modalities with a single device call, requiring less than one second on an NVIDIA H100 GPU.

Performance assessment also indicates that AlphaGenome performs well in previously unseen genomic regions and consistently outperforms current procedures. Across 24 genome-track prediction benchmarks, AlphaGenome exceeded the strongest competing model on 22 tasks. It achieved a 14.7% relative improvement in cell-type-specific gene-level expression log-fold-change prediction over Borzoi and a 6.3% relative improvement in chromatin contact map Pearson correlation over Orca, and it also achieved significant improvements over specialized models, including ProCapNet and ChromBPNet. AlphaGenome outperformed other models on 25 of 26 benchmarks for variant effect prediction, including gene expression, splicing, polyadenylation, enhancer-gene association, DNA accessibility, and transcription factor binding. Improvements were particularly pronounced and statistically significant for quantitative trait locus predictions, including a 25.5% increase in accurate expression QTL sign prediction over Borzoi and an 8.0% improvement in accessibility QTL prediction over ChromBPNet.

The ability to score variant effects simultaneously across multiple modalities enables mechanistic interpretation of clinically relevant variants, including those proximal to the TAL1 oncogene. It provides a unified view of how sequence variation disrupts regulatory processes.  Overall, AlphaGenome marks a significant breakthrough in sequence-to-function modelling and provides a robust, scalable framework for decoding the regulatory genome.

References: Avsec Z, Latysheva N, Cheng J, et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature. 2026;649:1206–1218. doi:10.1038/s41586-025-10014-0

Latest Posts

Free CME credits

Both our subscription plans include Free CME/CPD AMA PRA Category 1 credits.

Digital Certificate PDF

On course completion, you will receive a full-sized presentation quality digital certificate.

medtigo Simulation

A dynamic medical simulation platform designed to train healthcare professionals and students to effectively run code situations through an immersive hands-on experience in a live, interactive 3D environment.

medtigo Points

medtigo points is our unique point redemption system created to award users for interacting on our site. These points can be redeemed for special discounts on the medtigo marketplace as well as towards the membership cost itself.
 
  • Registration with medtigo = 10 points
  • 1 visit to medtigo’s website = 1 point
  • Interacting with medtigo posts (through comments/clinical cases etc.) = 5 points
  • Attempting a game = 1 point
  • Community Forum post/reply = 5 points

    *Redemption of points can occur only through the medtigo marketplace, courses, or simulation system. Money will not be credited to your bank account. 10 points = $1.

All Your Certificates in One Place

When you have your licenses, certificates and CMEs in one place, it's easier to track your career growth. You can easily share these with hospitals as well, using your medtigo app.

Our Certificate Courses