Phenotype-Driven Discovery of Therapeutic Perturbations by Graph-Based Causal Modeling

Phenotype-driven drug discovery (PDD) aims to identify the therapeutic treatments that reverse disease-like cellular states by direct observation of phenotypic reactions to the perturbations. Traditional methods rely on indirect algorithms, which are expensive and perform poorly when applied to large pertu therapeutic treatments that reverse disease-like cellular states through direct observation of phenotypic reactions to rbation datasets. PDGrapher was developed to address these issues. It is a graph neural network (GNN) based system that reframes the therapeutic lead discovery as a combinatorial prediction problem. 

The aim of PDGrapher is to predict gene sets capable of transforming diseased cells into healthy or treated-like states. It enables the discovery of both multi-gene and single-gene perturbations with therapeutic potential. This model uses diseased gene expression profiles in combination with the proxy causal graphs of gene-gene interactions. These are represented as gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. 

PDGrapher integrates two specialized modules: a perturbagen discovery module that recommends therapeutic gene sets by comparing the treated and diseased states, and a response prediction module that assesses these perturbagen by simulating the effect on gene expression.  

This dual-module design allows PDGrapher to directly infer targets rather than relying on phenotypic response libraries. Evaluation was conducted in 19 datasets which involve both genetic and chemical interventions under random split and leave-cell-out conditions. This ensures rigorous assessment of model generalizability.  

The results demonstrated that PDGrapher consistently outperformed existing methods like scGen and CellOT, which require training separate models for each perturbagen. In comparative analyses, PDGrapher was up to 25 times faster than scGen and over 100 times faster than CellOT, along with achieving superior accuracy in predicting therapeutic targets. PDGrapher’s predictions aligned with clinically validated targets and known drug mechanisms. 

Importantly, it uncovered novel candidates like CDK2 and TOP2A, linking them to experimental drug compounds and illustrating potential for both drug repurposing and new discovery. Researchers tested PDGrapher under various network perturbations, such as progressive edge removal from PPI networks and application to synthetic datasets with missing graph components and confounding factors, to assess its robustness. In all these factors, it maintained stable performance and gave resilience to latent confounders and graph sparsity.  

Ablation experiments showed that inclusion of cycle loss majorly improved model performance across multiple cell lines by enforcing the causal consistency and regularizing the predictions. PDGrapher is a method that relies on the transcriptomic data. It captures only one layer of cellular response. It has some limitations, such as assuming the absence of unobserved confounders and relying on incomplete, noisy, and context-specific resources like GRNs and PPI networks. Representation learning may help to mitigate these challenges, but the incompleteness of causal graph approximations may affect the precision of prediction. 

PDGrapher could benefit from integrating multimodal data, such as high-content morphological profiling through cell painting, to enhance predictive accuracy. The recent availability of large-scale cell morphology datasets like JUMP highlights the potential for this integration. Despite potential biases due to differences in anatomical origin and baseline gene expression, PDGrapher demonstrated robustness against such variability.  

PDGrapher provides a strong foundation for advancing phenotype-based therapeutic discovery. By relaxing causal assumptions, improving robustness to network incompleteness, and incorporating multimodal data, it could evolve into a powerful tool for next-generation drug discovery, interpretable, scalable, and capable of enabling personalized prediction of the therapeutic targets. 

References: Gonzalez G, Lin X, Herath I, et al. Combinatorial prediction of therapeutic perturbations using causally inspired neural networks. Nat Biomed Eng. 2025. doi:10.1038/s41551-025-01481-x 

Latest Posts

Free CME credits

Both our subscription plans include Free CME/CPD AMA PRA Category 1 credits.

Digital Certificate PDF

On course completion, you will receive a full-sized presentation quality digital certificate.

medtigo Simulation

A dynamic medical simulation platform designed to train healthcare professionals and students to effectively run code situations through an immersive hands-on experience in a live, interactive 3D environment.

medtigo Points

medtigo points is our unique point redemption system created to award users for interacting on our site. These points can be redeemed for special discounts on the medtigo marketplace as well as towards the membership cost itself.
 
  • Registration with medtigo = 10 points
  • 1 visit to medtigo’s website = 1 point
  • Interacting with medtigo posts (through comments/clinical cases etc.) = 5 points
  • Attempting a game = 1 point
  • Community Forum post/reply = 5 points

    *Redemption of points can occur only through the medtigo marketplace, courses, or simulation system. Money will not be credited to your bank account. 10 points = $1.

All Your Certificates in One Place

When you have your licenses, certificates and CMEs in one place, it's easier to track your career growth. You can easily share these with hospitals as well, using your medtigo app.

Our Certificate Courses