SignificanceOral cancer is one of the most prevalent cancers, especially in middle- and low-income countries such as India. Automatic segmentation of oral cancer images can improve the diagnostic workflow, which is a significant task in oral cancer image analysis. Despite the remarkable success of deep-learning networks in medical segmentation, they rarely provide uncertainty quantification for their output.AimWe aim to estimate uncertainty in a deep-learning approach to semantic segmentation of oral cancer images and to improve the accuracy and reliability of predictions.ApproachThis work introduced a UNet-based Bayesian deep-learning (BDL) model to segment potentially malignant and malignant lesion areas in the oral cavity. The model can quantify uncertainty in predictions. We also developed an efficient model that increased the inference speed, which is almost six times smaller and two times faster (inference speed) than the original UNet. The dataset in this study was collected using our customized screening platform and was annotated by oral oncology specialists.ResultsThe proposed approach achieved good segmentation performance as well as good uncertainty estimation performance. In the experiments, we observed an improvement in pixel accuracy and mean intersection over union by removing uncertain pixels. This result reflects that the model provided less accurate predictions in uncertain areas that may need more attention and further inspection. The experiments also showed that with some performance compromises, the efficient model reduced computation time and model size, which expands the potential for implementation on portable devices used in resource-limited settings.ConclusionsOur study demonstrates the UNet-based BDL model not only can perform potentially malignant and malignant oral lesion segmentation, but also can provide informative pixel-level uncertainty estimation. With this extra uncertainty information, the accuracy and reliability of the model’s prediction can be improved.
India has one of the highest rates of oral squamous cell carcinoma (OSCC) in the world, with an incidence of 15 per 100,000 and more than 70,000 deaths per year. The problem is exacerbated by lack of medical infrastructure and routine screening, especially in rural areas. This collaboration recently developed, and clinically validated, a low-cost, portable and easy-to-use platform for intraoral photodynamic therapy (PDT) specifically engineered for use in global health settings. Here, we explore the implementation of our low-cost PDT system in conjunction with a small, handheld smartphone-coupled, multichannel fluorescence and white-light oral cancer imaging probe, which was also developed for global health settings. Our study aimed to use this mobile intraoral imaging device for treatment guidance and monitoring PDT using 5-aminolevulinic acid (ALA)-induced protoporphyrin IX (PS; PpIX) fluorescence. A total of 12 patients with 14 lesions having moderately/well-differentiated micro-invasive OSCC lesions (<2 cm diameter, depth <5 mm) were systemically administered with three doses of 20mg/kg ALA (total 60mg/kg). Lesion site PpIX and auto fluorescence was analyzed before/after ALA administration, and again after light delivery (fractionated, total 100 J/cm2 of 630nm red LED light). Quantification of relative PpIX fluorescence enables lesion area segmentation to improve guidance of light delivery and reports extent of photobleaching. These results indicate the utility of this approach for image-guided PDT and treatment monitoring while also laying groundwork for an integrated approach, combining cancer screening and treatment with the same hardware.
Significance: Convolutional neural networks (CNNs) show the potential for automated classification of different cancer lesions. However, their lack of interpretability and explainability makes CNNs less than understandable. Furthermore, CNNs may incorrectly concentrate on other areas surrounding the salient object, rather than the network’s attention focusing directly on the object to be recognized, as the network has no incentive to focus solely on the correct subjects to be detected. This inhibits the reliability of CNNs, especially for biomedical applications.
Aim: Develop a deep learning training approach that could provide understandability to its predictions and directly guide the network to concentrate its attention and accurately delineate cancerous regions of the image.
Approach: We utilized Selvaraju et al.’s gradient-weighted class activation mapping to inject interpretability and explainability into CNNs. We adopted a two-stage training process with data augmentation techniques and Li et al.’s guided attention inference network (GAIN) to train images captured using our customized mobile oral screening devices. The GAIN architecture consists of three streams of network training: classification stream, attention mining stream, and bounding box stream. By adopting the GAIN training architecture, we jointly optimized the classification and segmentation accuracy of our CNN by treating these attention maps as reliable priors to develop attention maps with more complete and accurate segmentation.
Results: The network’s attention map will help us to actively understand what the network is focusing on and looking at during its decision-making process. The results also show that the proposed method could guide the trained neural network to highlight and focus its attention on the correct lesion areas in the images when making a decision, rather than focusing its attention on relevant yet incorrect regions.
Conclusions: We demonstrate the effectiveness of our approach for more interpretable and reliable oral potentially malignant lesion and malignant lesion classification.
KEYWORDS: Cancer, Image classification, Data modeling, Tumor growth modeling, Medical imaging, Neural networks, Medical research, Breast cancer, Biomedical optics, Mobile devices
Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification.
Aim: To reduce the class bias caused by data imbalance.
Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings.
Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications.
Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
Significance: Oral cancer is among the most common cancers globally, especially in low- and middle-income countries. Early detection is the most effective way to reduce the mortality rate. Deep learning-based cancer image classification models usually need to be hosted on a computing server. However, internet connection is unreliable for screening in low-resource settings.
Aim: To develop a mobile-based dual-mode image classification method and customized Android application for point-of-care oral cancer detection.
Approach: The dataset used in our study was captured among 5025 patients with our customized dual-modality mobile oral screening devices. We trained an efficient network MobileNet with focal loss and converted the model into TensorFlow Lite format. The finalized lite format model is ∼16.3 MB and ideal for smartphone platform operation. We have developed an Android smartphone application in an easy-to-use format that implements the mobile-based dual-modality image classification approach to distinguish oral potentially malignant and malignant images from normal/benign images.
Results: We investigated the accuracy and running speed on a cost-effective smartphone computing platform. It takes ∼300 ms to process one image pair with the Moto G5 Android smartphone. We tested the proposed method on a standalone dataset and achieved 81% accuracy for distinguishing normal/benign lesions from clinically suspicious lesions, using a gold standard of clinical impression based on the review of images by oral specialists.
Conclusions: Our study demonstrates the effectiveness of a mobile-based approach for oral cancer screening in low-resource settings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.