Waldo is a new AI system that is making waves in public health by finding adverse events (AEs) in online conversations about cannabis products with almost perfect accuracy. To develop Waldo, researchers from UC San Diego, Johns Hopkins University, and other places worked together; this could change the way regulators, doctors, and consumers keep an eye on safety issues with health products that aren’t regulated.
PLOS Digital Health published a study that included consumers, doctors, and companies reporting cases of unexpected side effects to traditional systems, such as the FDA’s MedWatch. The product categories, like cannabis-derived products, don’t have good post-market surveillance, and underreporting is common. Instead, many individuals discuss their medical concerns in online groups, but most of this information remains unnoticed.
The research team conducted training and tested three machine learning models on 10,000 Reddit posts about cannabis-derived products. The top performer, RoBERTa, was given the name Waldo and was able to find adverse events like anxiety, paranoia, or physical discomfort related to cannabis use with 99.7% accuracy and an F1-score of 95.1%.
ChatGPT achieved 94.4% accuracy in its answers, but struggled with false positives and negatives, resulting in an F1-score of only 38%. Waldo was 18 times better than ChatGPT at avoiding false alarms and 14 times better at not missing real events.
Waldo found 28,832 possible adverse events when looking at 437,132 Reddit posts from 20 cannabis-related subreddits. The subreddit r/Marijuana had the most AE with 12.7%, followed by r/weed (10.5%) and r/AskTrees (10.0%). The lowest rates were in communities like r/macrogrowery (0.2%), r/weedstocks (0.1%), and r/weedbiz (0.2%).
Examples of flagged cases following CBD use ranged from panic attacks after small doses to tinnitus and signals that may never reach regulators through official channels.
It is crucial in detecting safety signals early as the cannabis market grows without federal oversight. Unlike pharmaceuticals and medical devices, cannabis-derived products face minimal testing and regulation. This gap has led to risks such as vaping-related lung injuries in recent years.
A much-needed layer of surveillance is provided by Waldo’s capacity to process informal and extensive user narratives, particularly for recreational and wellness products that are disregarded by conventional systems.
According to the study, Waldo is a significant step towards democratizing pharmacovigilance. The health community can use Waldo for free. The project’s official site, https://waldo-ae-detection.github.io/WALDO/, has the code and datasets that researchers can use.
The authors warn against relying too much on automation, even though Waldo is powerful. It is still essential for people to review flagged cases to ensure their accuracy and prevent false signals. In the future, the system may include active learning, which would let it get better all the time based on feedback from people.
However, the promise is clear that Waldo provides a relatively inexpensive, scalable, and highly accurate way to monitor health risks that are often overlooked by regulatory agencies. This approach could be used for more than just cannabis. It could also be used for other products with limited safety data, such as dietary supplements and e-cigarettes.
Reference: Desai KS, Tiyyala VM, Tiyyala P, Yeola A, Gallegos-Rangel A, Montiel-Torres A, et al. Waldo: Automated discovery of adverse events from unstructured self reports. PLOS Digit Health. 2025;4(9):e0001011. doi:10.1371/journal.pdig.0001011





