Phenotype-Driven Discovery of Therapeutic Perturbations by Graph-Based Causal Modeling

September 11, 2025

Nirali Patel

Phenotype-driven drug discovery (PDD) aims to identify the therapeutic treatments that reverse disease-like cellular states by direct observation of phenotypic reactions to the perturbations. Traditional methods rely on indirect algorithms, which are expensive and perform poorly when applied to large pertu therapeutic treatments that reverse disease-like cellular states through direct observation of phenotypic reactions to rbation datasets. PDGrapher was developed to address these issues. It is a graph neural network (GNN) based system that reframes the therapeutic lead discovery as a combinatorial prediction problem.

The aim of PDGrapher is to predict gene sets capable of transforming diseased cells into healthy or treated-like states. It enables the discovery of both multi-gene and single-gene perturbations with therapeutic potential. This model uses diseased gene expression profiles in combination with the proxy causal graphs of gene-gene interactions. These are represented as gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks.

PDGrapher integrates two specialized modules: a perturbagen discovery module that recommends therapeutic gene sets by comparing the treated and diseased states, and a response prediction module that assesses these perturbagen by simulating the effect on gene expression.

This dual-module design allows PDGrapher to directly infer targets rather than relying on phenotypic response libraries. Evaluation was conducted in 19 datasets which involve both genetic and chemical interventions under random split and leave-cell-out conditions. This ensures rigorous assessment of model generalizability.

The results demonstrated that PDGrapher consistently outperformed existing methods like scGen and CellOT, which require training separate models for each perturbagen. In comparative analyses, PDGrapher was up to 25 times faster than scGen and over 100 times faster than CellOT, along with achieving superior accuracy in predicting therapeutic targets. PDGrapher’s predictions aligned with clinically validated targets and known drug mechanisms.

Importantly, it uncovered novel candidates like CDK2 and TOP2A, linking them to experimental drug compounds and illustrating potential for both drug repurposing and new discovery. Researchers tested PDGrapher under various network perturbations, such as progressive edge removal from PPI networks and application to synthetic datasets with missing graph components and confounding factors, to assess its robustness. In all these factors, it maintained stable performance and gave resilience to latent confounders and graph sparsity.

Ablation experiments showed that inclusion of cycle loss majorly improved model performance across multiple cell lines by enforcing the causal consistency and regularizing the predictions. PDGrapher is a method that relies on the transcriptomic data. It captures only one layer of cellular response. It has some limitations, such as assuming the absence of unobserved confounders and relying on incomplete, noisy, and context-specific resources like GRNs and PPI networks. Representation learning may help to mitigate these challenges, but the incompleteness of causal graph approximations may affect the precision of prediction.

PDGrapher could benefit from integrating multimodal data, such as high-content morphological profiling through cell painting, to enhance predictive accuracy. The recent availability of large-scale cell morphology datasets like JUMP highlights the potential for this integration. Despite potential biases due to differences in anatomical origin and baseline gene expression, PDGrapher demonstrated robustness against such variability.

PDGrapher provides a strong foundation for advancing phenotype-based therapeutic discovery. By relaxing causal assumptions, improving robustness to network incompleteness, and incorporating multimodal data, it could evolve into a powerful tool for next-generation drug discovery, interpretable, scalable, and capable of enabling personalized prediction of the therapeutic targets.

References: Gonzalez G, Lin X, Herath I, et al. Combinatorial prediction of therapeutic perturbations using causally inspired neural networks. Nat Biomed Eng. 2025. doi:10.1038/s41551-025-01481-x