Synthetic Histology Images for Training AI Models: A Novel Approach to Improve Prostate Cancer Diagnosis | bioRxiv
Synthetic Histology Images for Training AI Models: A Novel Approach to Improve Prostate Cancer Diagnosis
New Results Follow this preprint
, Cheng-Bang Chen, Oleksander Kryvenko, Sanoj Punnen, Victor Sandoval, Sheetal Malpani, Ahmed Noman, Farhan Ismael, Andres BriseƱo, Yujie Wang, Himanshu Arora
ABSTRACT
Prostate cancer (PCa) poses significant challenges for timely diagnosis and prognosis, leading to high mortality rates and increased disease risk and treatment costs. Recent advancements in machine learning and digital imagery offer promising potential for developing automated and objective assessment pipelines that can reduce human capital and resource costs. However, the reliance of AI models on large amounts of clinical data for training presents a significant challenge, as this data is often biased, lacking diversity, and not readily available.
Here we aim to address this limitation by employing customized generative adversarial network (GAN) models to produce high-quality synthetic images of different PCa grades (radical prostatectomy (RP)) and needle biopsies, which were customized to account for the granularity associated with each Gleason grade. The generated images were subjected to multiple rounds of benchmarking, quantifications and quality control assessment before being used to train an AI model (EfficientNet) for grading digital histology images of adenocarcinoma specimens (RP sections) and needle biopsies obtained from the PANDA challenge repository.
Validation was performed using the AI model trained with synthetic data to grade digital histology from the cancer genome atlas (TCGA) (RP sections) and needle biopsy data from Radboud University Medical Center and Karolinska Institute. Results demonstrated that the AI model trained with a combination of image patches derived from original and enhanced synthetic images outperformed the model trained with original digital histology images.
Together, this study demonstrates the potential of customized GAN models to generate a large cohort of synthetic data that can train AI models to effectively grade PCa specimens. This approach could potentially eliminate the need for extensive clinical data for training any AI model in the domain of digital imagery, leading to cost and time-effective diagnosis and prognosis.
Comments
Post a Comment