Paper mills, commercial entities that manufacture and sell academic manuscripts, have become a growing threat to research integrity, specifically in the biomedical sciences. These organizations produce large volumes of manuscripts using standardised templates, fabricated or manipulated data, and false authorship arrangements. Cancer research is particularly vulnerable to intense publication pressure, relatively uniform experimental designs, and intense competition for funding and academic advancement. Although earlier estimates suggested that approximately 3% of the biomedical literature may involve paper mills, the scale of the problem in cancer research remains poorly quantified. The increasing sophistication of fabricated manuscripts and the rapid expansion of scientific output highlight the need for scalable, objective detection methods that can support traditional editorial and peer-review processes.
This study aimed to develop and validate a machine learning model to distinguish suspected paper-mill cancer research articles from genuine publications using titles and abstracts. The model was further applied to the global cancer research literature to estimate the prevalence of papers exhibiting textual characteristics similar to known paper-mill publications and to analyse temporal, geographic, disciplinary, and journal-level trends.
A fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model was developed for binary classification. The training dataset comprised 2,202 retracted cancer research articles identified as paper-mill products in the Retraction Watch database. It was paired with an equal number of control articles selected as proxies for legitimate research. Controls were drawn from high-impact journals and from countries with low representation among known paper-mill retractions to reduce misclassification. External validation used an independent dataset of 3,094 suspected paper-mill papers identified by image-integrity experts, along with matched controls. Only titles and abstracts were analysed, enabling large-scale screening without reliance on full texts. Texts were segmented into sentences and aggregated at the document level. Model performance was evaluated using accuracy, sensitivity, and specificity, with a design emphasis on high specificity. The validated model was applied to 2,647,471 original cancer research articles indexed in PubMed from 1999 to 2024. Prevalence estimates were accompanied by 95% confidence interval (CI) generated using bootstrap resampling.
The model showed strong discrimination between suspected paper mills and genuine papers. Internal validation achieved an accuracy of 0.91, a sensitivity of 0.87, and a specificity of 0.96. External validation showed comparable performance, with an accuracy of 0.93, sensitivity of 0.87, and specificity of 0.99, indicating good generalisability. Approximately 72% of previously reported problematic cancer papers were flagged despite the absence of technical data, such as images or nucleotide sequences, in the supplementary validation.
Application of the model to the full cancer literature detected 261,245 papers as textually similar to known paper mill publications, corresponding to an estimated prevalence of 9.87% (95% CI: 9.83 to 9.90). Temporal analysis revealed a sharp increase in the number of flagged papers from 1999 to 2022, with exponential growth (R² = 0.92), increasing from about 1% of annual publications in the early 2000s to over 15% in the early 2020s, followed by a modest reduction in 2023 to 2024. China accounted for the largest share, with 36% of cancer research papers geographically flagged. Higher proportions were observed in Iran, Saudi Arabia, Egypt, Pakistan, and Malaysia, whereas the United States showed a low proportion (2%) but a high absolute number. Specific journals exhibited very high proportions of flagged papers, and major publishers showed lower percentages but large absolute volumes. Flagged papers were common in basic and translational research on gastric, bone, and liver cancers. Epidemiology and supportive care showed minimal involvement. The proportion of flagged papers in the top-decline impact factor journals exceeded 10% by 2022.
This study shows that machine learning analysis of titles and abstracts can effectively detect large-scale textual patterns associated with paper-mill activity in cancer research. The findings suggest that suspected paper-mill publications are far more prevalent than previously estimated and are increasingly prevalent across countries, publishers, cancer types, and high-impact journals. Although flagged papers do not constitute definitive evidence of misconduct and require expert verification, the results highlight a substantial threat to the credibility of cancer research. This approach provides a scalable tool for research integrity surveillance and editorial triage, underscoring the urgent need for coordinated institutional- and publisher-level interventions to protect the scientific record.
References: Scancar B, Byrne JA, Causeur D, Barnett AG. Machine learning–based screening of potential paper mill publications in cancer research: methodological and cross-sectional study. BMJ. 2026;392:e087581. doi:10.1136/bmj-2025-087581





