PurposeSelf-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.ApproachWe benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models’ understanding of chest CT exams.ResultsThe use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as ∼380 K with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.ConclusionWe released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.
Development of kidney segmentation models has largely focused on using contrast-enhanced CT exams. The KiTS segmentation challenge, in particular, has provided a benchmark using 300 annotated arterial phase CT scans. Review of the best performing models identifies 3D Unet models with residual connections as the best performing models for kidney segmentation. Li et al. found a U-net architecture with residual connections to provide the best performance for their segmentation task. Their work focused on segmenting kidney parenchyma alongside kidney stones using a dataset of 257 studies that was recently made available. Yu et al. investigated the ability to train a multi-organ nn-Unet model using simultaneous contrast and non-contrast images. The authors found the model to achieve high dice scores for kidney segmentation with average dice scores of 0.96.4 Inspection of output segmentation found the model to underperform for non-contrast images as measured by smaller quality assessment scores. Tang et al. took a similar approach where early and late arterial phase scans were used to train a patch-based network to segment renal structures. The group found the model to perform adequately on test data, with no distinction between late and early arterial phase performance. Lee et al. attempted to reduce the dependency of labeling multiple phases by using paired samples where only the contrast-enhanced volume was annotated. They were able to outperform existing models but were limited by the need for correct anatomical correspondence between scans. Ananda et al. removed the dependency for paired samples by training a dual discriminator-based network where the model is trained in three phases. One phase ensures the consistency of segmentation of contrast phase images; the following two phases ensure the image encoding and output maps are not significantly different from contrast and non-contrast images. Dinsdale et al. implemented a similar multi-step approach to improve segmentation quality by improving resiliency to agerelated physiological changes in Brain MRI, where cross entropy(CE) loss is utilized for training a discriminator on the patient’s age using features from the bottleneck and segmentation map. In contrast, the final phase uses a confusion loss, penalizing the model for having greater confidence for a particular age.8 Despite all the modeling effort, the application of evaluating patients with impaired kidney function is challenging since variation occurs not only due to phase of contrast but also the level of renal function, and thus the consistency of imaging appearance is not guaranteed. Establishing the need for techniques that would improve the robustness of kidney segmentation models. Within the scope of this work, we proposed various techniques to generate models resilient to different contrast phases and externally validated the models.
Despite the expert-level performance of artificial intelligence (AI) models for various medical imaging tasks, real-world performance failures with disparate outputs for various minority subgroups limit the usefulness of AI in improving patients’ lives. AI has been shown to have a remarkable ability to detect protected attributes of age, sex, and race, while the same models demonstrate bias against historically underserved subgroups of age, sex, and race in disease diagnosis. Therefore, an AI model may take shortcut predictions from these correlations and subsequently generate an outcome that is biased toward certain subgroups even when protected attributes are not explicitly used as inputs into the model. This talk will discuss various types of bias from shortcut learning that may occur at different phases of AI model development. I will also summarize current techniques for mitigating bias from preprocessing (data-centric solutions) and during model development (computational solutions) and postprocessing (recalibration of learning).
Biological age of a person represents their cellular level health which may be affected by extrinsic factors indicating socioeconomic disadvantage. Biological age (BA) can provide better estimates for age-related comorbidities than chronological age. BA requires well-established laboratory tests for estimation. As an alternative, we designed an image processing model for estimation of biological age from computed tomography scans of the head. We analyzed the relation between gap in biological and chronological age and socioeconomic status or social determinants of health estimated by social deprivation index (SDI). Our model for BA estimation achieved mean absolute error (MAE) of approximately 9 years between estimated biological and chronological age with -0.11 correlation coefficient with SDI. With the fusion of imaging and SDI in the process of age estimation, MAE is reduced by 11%.
Self-supervised pretraining can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the imaging data. We developed a foundation model for chest computed tomography exams using selfsupervised training strategy of masked image region prediction on 1M chest CT slices. The model was evaluated for two downstream tasks; pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Use of the foundation model as a backbone improved performance and reduced computational effort needed for downstream tasks compared to task-specific state-of-the-art (SOTA) models. PE detection was improved for training dataset sizes as large as 380K with maximum gain of 5% over SOTA. Segmentation model initialized with foundation model weights learned twice as fast as randomly initialized model. This model can then be finetuned with limited task-specific annotated data for a variety of downstream imaging tasks thus accelerating research in biomedical imaging informatics.
KEYWORDS: Data modeling, Education and training, RGB color model, Echocardiography, Performance modeling, Deep learning, Motion models, Ablation, 3D modeling, Image classification
PurposeThe inherent characteristics of transthoracic echocardiography (TTE) images such as low signal-to-noise ratio and acquisition variations can limit the direct use of TTE images in the development and generalization of deep learning models. As such, we propose an innovative automated framework to address the common challenges in the process of echocardiography deep learning model generalization on the challenging task of constrictive pericarditis (CP) and cardiac amyloidosis (CA) differentiation.ApproachPatients with a confirmed diagnosis of CP or CA and normal cases from Mayo Clinic Rochester and Arizona were identified to extract baseline demographics and the apical 4 chamber view from TTE studies. We proposed an innovative preprocessing and image generalization framework to process the images for training the ResNet50, ResNeXt101, and EfficientNetB2 models. Ablation studies were conducted to justify the effect of each proposed processing step in the final classification performance.ResultsThe models were initially trained and validated on 720 unique TTE studies from Mayo Rochester and further validated on 225 studies from Mayo Arizona. With our proposed generalization framework, EfficientNetB2 generalized the best with an average area under the curve (AUC) of 0.96 (±0.01) and 0.83 (±0.03) on the Rochester and Arizona test sets, respectively.ConclusionsLeveraging the proposed generalization techniques, we successfully developed an echocardiography-based deep learning model that can accurately differentiate CP from CA and normal cases and applied the model to images from two sites. The proposed framework can be further extended for the development of echocardiography-based deep learning models.
KEYWORDS: Data modeling, Data fusion, Diseases and disorders, Image fusion, Chest imaging, Performance modeling, Modeling, Machine learning, COVID 19, Education and training
PurposeOur study investigates whether graph-based fusion of imaging data with non-imaging electronic health records (EHR) data can improve the prediction of the disease trajectories for patients with coronavirus disease 2019 (COVID-19) beyond the prediction performance of only imaging or non-imaging EHR data.ApproachWe present a fusion framework for fine-grained clinical outcome prediction [discharge, intensive care unit (ICU) admission, or death] that fuses imaging and non-imaging information using a similarity-based graph structure. Node features are represented by image embedding, and edges are encoded with clinical or demographic similarity.ResultsExperiments on data collected from the Emory Healthcare Network indicate that our fusion modeling scheme performs consistently better than predictive models developed using only imaging or non-imaging features, with area under the receiver operating characteristics curve of 0.76, 0.90, and 0.75 for discharge from hospital, mortality, and ICU admission, respectively. External validation was performed on data collected from the Mayo Clinic. Our scheme highlights known biases in the model prediction, such as bias against patients with alcohol abuse history and bias based on insurance status.ConclusionsOur study signifies the importance of the fusion of multiple data modalities for the accurate prediction of clinical trajectories. The proposed graph structure can model relationships between patients based on non-imaging EHR data, and graph convolutional networks can fuse this relationship information with imaging data to effectively predict future disease trajectory more effectively than models employing only imaging or non-imaging data. Our graph-based fusion modeling frameworks can be easily extended to other prediction tasks to efficiently combine imaging data with non-imaging clinical data.
The use of artificial intelligence (AI) in healthcare has become a very active research area in the last few years. While significant progress has been made in image classification tasks, only a few AI methods are actually deployed in clinical settings. A major hurdle in actively using clinical AI models is the trustworthiness of these models. Often, these complex models are utilized as black boxes in which promising results are generated. However, when scrutinized, these models reveal implicit biases during decision-making, such as having an unintended bias towards particular ethnic groups and sub-populations. In our study, we develop a two-step adversarial debiasing approach with partial learning that can reduce the disparity while preserving the performance of the targeted diagnosis/classification task. The methodology has been evaluated on two independent medical image case studies - chest X-rays and mammograms and showed promises in bias reduction while preserving the targeted performance on both internal and external datasets.
Purpose: In recent years, the development and exploration of deeper and more complex deep learning models has been on the rise. However, the availability of large heterogeneous datasets to support efficient training of deep learning models is lacking. While linear image transformations for augmentation have been used traditionally, the recent development of generative adversarial networks (GANs) could theoretically allow us to generate an infinite amount of data from the real distribution to support deep learning model training. Recently, the Radiological Society of North America (RSNA) curated a multiclass hemorrhage detection challenge dataset that includes over 800,000 images for hemorrhage detection, but all high-performing models were trained using traditional data augmentation techniques. Given a wide variety of selections, the augmentation for image classification often follows a trial-and-error policy.
Approach: We designed conditional DCGAN (cDCGAN) and in parallel trained multiple popular GAN models to use as online augmentations and compared them to traditional augmentation methods for the hemorrhage case study.
Results: Our experimentations show that the super-minority, epidural hemorrhages with cDCGAN augmentation presented a minimum of 2 × improvement in their performance against the traditionally augmented model using the same classifier configuration.
Conclusion: This shows that for complex and imbalanced datasets, traditional data imbalancing solutions may not be sufficient and require more complex and diverse data augmentation methods such as GANs to solve.
KEYWORDS: Data modeling, Medical imaging, Binary data, Performance modeling, Data centers, Visualization, Sensors, Visual process modeling, Tumor growth modeling, Distortion
Purpose: Existing anomaly detection methods focus on detecting interclass variations while medical image novelty identification is more challenging in the presence of intraclass variations. For example, a model trained with normal chest x-ray and common lung abnormalities is expected to discover and flag idiopathic pulmonary fibrosis, which is a rare lung disease and unseen during training. The nuances of intraclass variations and lack of relevant training data in medical image analysis pose great challenges for existing anomaly detection methods.
Approach: We address the above challenges by proposing a hybrid model—transformation-based embedding learning for novelty detection (TEND), which combines the merits of classifier-based approach and AutoEncoder (AE)-based approach. Training TEND consists of two stages. In the first stage, we learn in-distribution embeddings with an AE via the unsupervised reconstruction. In the second stage, we learn a discriminative classifier to distinguish in-distribution data and the transformed counterparts. Additionally, we propose a margin-aware objective to pull in-distribution data in a hypersphere while pushing away the transformed data. Eventually, the weighted sum of class probability and the distance to margin constitutes the anomaly score.
Results: Extensive experiments are performed on three public medical image datasets with the one-vs-rest setup (namely one class as in-distribution data and the left as intraclass out-of-distribution data) and the rest-vs-one setup. Additional experiments on generated intraclass out-of-distribution data with unused transformations are implemented on the datasets. The quantitative results show competitive performance as compared to the state-of-the-art approaches. Provided qualitative examples further demonstrate the effectiveness of TEND.
Conclusion: Our anomaly detection model TEND can effectively identify the challenging intraclass out-of-distribution medical images in an unsupervised fashion. It can be applied to discover unseen medical image classes and serve as the abnormal data screening for downstream medical tasks. The corresponding code is available at https://github.com/XiaoyuanGuo/TEND_MedicalNoveltyDetection.
KEYWORDS: Breast cancer, Performance modeling, Tumors, Magnetic resonance imaging, Data modeling, Wavelets, Statistical modeling, Image filtering, Statistical analysis, Breast
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is sensitive but not specific to determining treatment response in early stage triple-negative breast cancer (TNBC) patients. We propose an efficient computerized technique for assessing treatment response, specifically the residual tumor (RT) status and pathological complete response (pCR), in response to neoadjuvant chemotherapy. The proposed approach is based on Riesz wavelet analysis of pharmacokinetic maps derived from noninvasive DCE-MRI scans, obtained before and after treatment. We compared the performance of Riesz features with the traditional gray level co-occurrence matrices and a comprehensive characterization of the lesion that includes a wide range of quantitative features (e.g., shape and boundary). We investigated a set of predictive models (∼96) incorporating distinct combinations of quantitative characterizations and statistical models at different time points of the treatment and some area under the receiver operating characteristic curve (AUC) values we reported are above 0.8. The most efficient models are based on first-order statistics and Riesz wavelets, which predicted RT with an AUC value of 0.85 and pCR with an AUC value of 0.83, improving results reported in a previous study by ∼13%. Our findings suggest that Riesz texture analysis of TNBC lesions can be considered a potential framework for optimizing TNBC patient care.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.