KEYWORDS: Image segmentation, Video, Ear, Education and training, Diagnostics, Visual process modeling, Transformers, Semantics, Diseases and disorders, Data modeling
Hearing loss as a significant global health concern, encompassing costs across society through healthcare, education, and productivity impacts. Traditional otoscopic diagnostic methods pose challenges, prompting the development of computer-aided diagnosis (CAD) systems. Tympanic membrane (TM) segmentation is a crucial and vital task for early diagnosis and intervention in middle ear diseases. Automatic TM segmentation in CAD systems improves diagnostic accuracy. This study presents a method for the automatic segmentation of the TM from video-otoscopic frames based on Segment Anything Model Adapter (SAM-Adapter). To the best of our knowledge, this research is the first application of a SAM-Adapter segmentation model for segmenting TM areas from otoscopic frames. 765 video frames from 36 otoscopic videos were used to train and test the model. The experimental results show that the SAM-Adapter achieves high segmentation performance with a Dice similarity coefficient of 0.9486 without any pre-processing and postprocessing steps. Empirical results showed that the SAM-Adapter model is better than the U-Net-based models in our dataset.
Tympanic membrane (TM) diseases are among the most frequent pathologies, affecting the majority of the pediatric population. Video otoscopy is an effective tool for diagnosing TM diseases. However, access to Ear, Nose, and Throat (ENT) physicians is limited in many sparsely-populated regions worldwide. Moreover, high inter- and intra-reader variability impair accurate diagnosis. This study proposes a digital otoscopy video summarization and automated diagnostic label assignment model that benefits from the synergy of deep learning and natural language processing (NLP). Our main motivation is to obtain the key visual features of TM diseases from their short descriptive reports. Our video database consisted of 173 otoscopy records from three different TM diseases. To generate composite images, we utilized our previously developed semantic segmentation-based stitching framework, SelectStitch. An ENT expert reviewed these composite images and wrote short reports describing the TM's visual landmarks and the disease for each ear. Based on NLP and a bag-of-words (BoW) model, we determined the five most frequent words characterizing each TM diagnostic category. A neighborhood components analysis was used to predict the diagnostic label of the test instance. The proposed model provided an overall F1-score of 90.2%. This is the first study to utilize textual information in computerized ear diagnostics to the best of our knowledge. Our model has the potential to become a telemedicine application that can automatically make a diagnosis of the TM by analyzing its visual descriptions provided by a healthcare provider from a mobile device.
Ear diseases are frequently occurring conditions affecting the majority of the pediatric population, potentially resulting in hearing loss and communication disabilities. The current standard of care in diagnosing ear diseases includes a visual examination of the tympanic membrane (TM) by a medical expert with a range of available otoscopes. However, visual examination is subjective and depends on various factors, including the experience of the expert. This work proposes a decision fusion mechanism to combine predictions obtained from digital otoscopy images and biophysical measurements (obtained through tympanometry) for the detection of eardrum abnormalities. Our database consisted of 73 tympanometry records along with digital otoscopy videos. For the tympanometry aspect, we trained a random forest classifier (RF) using raw tympanometry attributes. Additionally, we mimicked a clinician’s decision on tympanometry findings using the normal range of the tympanogram values provided by a clinical guide. Moreover, we re-trained Inception-ResNet-v2 to classify TM images selected from each otoscopic video. After obtaining predictions from each of three different sources, we performed a majority voting-based decision fusion technique to reach the final decision. Experimental results show that the proposed decision fusion method improved the classification accuracy, positive predictive value, and negative predictive value in comparison to the single classifiers. The results revealed that the accuracies are 64.4% for the clinical evaluations of tympanometry, 76.7% for the computerized analysis of tympanometry data, and 74.0% for the TM image analysis while our decision fusion methodology increases the classification accuracy to 84.9%. To the best of our knowledge, this is the first study to fuse the data from digital otoscopy and tympanometry. Preliminary results suggest that fusing information from different sources of sensors may provide complementary information for accurate and computerized diagnosis of TM-related abnormalities.
In this study, we proposed an approach to report the condition of the eardrum as “normal” or “abnormal” by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(± 12.1%) and 82.6% (± 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (± 10.3%).
In this study, we propose an automated otoscopy image analysis system called Autoscope. To the best of our knowledge, Autoscope is the first system designed to detect a wide range of eardrum abnormalities by using high-resolution otoscope images and report the condition of the eardrum as “normal” or “abnormal.” In order to achieve this goal, first, we developed a preprocessing step to reduce camera-specific problems, detect the region of interest in the image, and prepare the image for further analysis. Subsequently, we designed a new set of clinically motivated eardrum features (CMEF). Furthermore, we evaluated the potential of the visual MPEG-7 descriptors for the task of tympanic membrane image classification. Then, we fused the information extracted from the CMEF and state-of-the-art computer vision features (CVF), which included MPEG-7 descriptors and two additional features together, using a state of the art classifier. In our experiments, 247 tympanic membrane images with 14 different types of abnormality were used, and Autoscope was able to classify the given tympanic membrane images as normal or abnormal with 84.6% accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.