PurposeSelf-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.ApproachWe benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models’ understanding of chest CT exams.ResultsThe use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as ∼380 K with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.ConclusionWe released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.
Development of kidney segmentation models has largely focused on using contrast-enhanced CT exams. The KiTS segmentation challenge, in particular, has provided a benchmark using 300 annotated arterial phase CT scans. Review of the best performing models identifies 3D Unet models with residual connections as the best performing models for kidney segmentation. Li et al. found a U-net architecture with residual connections to provide the best performance for their segmentation task. Their work focused on segmenting kidney parenchyma alongside kidney stones using a dataset of 257 studies that was recently made available. Yu et al. investigated the ability to train a multi-organ nn-Unet model using simultaneous contrast and non-contrast images. The authors found the model to achieve high dice scores for kidney segmentation with average dice scores of 0.96.4 Inspection of output segmentation found the model to underperform for non-contrast images as measured by smaller quality assessment scores. Tang et al. took a similar approach where early and late arterial phase scans were used to train a patch-based network to segment renal structures. The group found the model to perform adequately on test data, with no distinction between late and early arterial phase performance. Lee et al. attempted to reduce the dependency of labeling multiple phases by using paired samples where only the contrast-enhanced volume was annotated. They were able to outperform existing models but were limited by the need for correct anatomical correspondence between scans. Ananda et al. removed the dependency for paired samples by training a dual discriminator-based network where the model is trained in three phases. One phase ensures the consistency of segmentation of contrast phase images; the following two phases ensure the image encoding and output maps are not significantly different from contrast and non-contrast images. Dinsdale et al. implemented a similar multi-step approach to improve segmentation quality by improving resiliency to agerelated physiological changes in Brain MRI, where cross entropy(CE) loss is utilized for training a discriminator on the patient’s age using features from the bottleneck and segmentation map. In contrast, the final phase uses a confusion loss, penalizing the model for having greater confidence for a particular age.8 Despite all the modeling effort, the application of evaluating patients with impaired kidney function is challenging since variation occurs not only due to phase of contrast but also the level of renal function, and thus the consistency of imaging appearance is not guaranteed. Establishing the need for techniques that would improve the robustness of kidney segmentation models. Within the scope of this work, we proposed various techniques to generate models resilient to different contrast phases and externally validated the models.
Biological age of a person represents their cellular level health which may be affected by extrinsic factors indicating socioeconomic disadvantage. Biological age (BA) can provide better estimates for age-related comorbidities than chronological age. BA requires well-established laboratory tests for estimation. As an alternative, we designed an image processing model for estimation of biological age from computed tomography scans of the head. We analyzed the relation between gap in biological and chronological age and socioeconomic status or social determinants of health estimated by social deprivation index (SDI). Our model for BA estimation achieved mean absolute error (MAE) of approximately 9 years between estimated biological and chronological age with -0.11 correlation coefficient with SDI. With the fusion of imaging and SDI in the process of age estimation, MAE is reduced by 11%.
Self-supervised pretraining can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the imaging data. We developed a foundation model for chest computed tomography exams using selfsupervised training strategy of masked image region prediction on 1M chest CT slices. The model was evaluated for two downstream tasks; pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Use of the foundation model as a backbone improved performance and reduced computational effort needed for downstream tasks compared to task-specific state-of-the-art (SOTA) models. PE detection was improved for training dataset sizes as large as 380K with maximum gain of 5% over SOTA. Segmentation model initialized with foundation model weights learned twice as fast as randomly initialized model. This model can then be finetuned with limited task-specific annotated data for a variety of downstream imaging tasks thus accelerating research in biomedical imaging informatics.
KEYWORDS: Data modeling, Education and training, RGB color model, Echocardiography, Performance modeling, Deep learning, Motion models, Ablation, 3D modeling, Image classification
PurposeThe inherent characteristics of transthoracic echocardiography (TTE) images such as low signal-to-noise ratio and acquisition variations can limit the direct use of TTE images in the development and generalization of deep learning models. As such, we propose an innovative automated framework to address the common challenges in the process of echocardiography deep learning model generalization on the challenging task of constrictive pericarditis (CP) and cardiac amyloidosis (CA) differentiation.ApproachPatients with a confirmed diagnosis of CP or CA and normal cases from Mayo Clinic Rochester and Arizona were identified to extract baseline demographics and the apical 4 chamber view from TTE studies. We proposed an innovative preprocessing and image generalization framework to process the images for training the ResNet50, ResNeXt101, and EfficientNetB2 models. Ablation studies were conducted to justify the effect of each proposed processing step in the final classification performance.ResultsThe models were initially trained and validated on 720 unique TTE studies from Mayo Rochester and further validated on 225 studies from Mayo Arizona. With our proposed generalization framework, EfficientNetB2 generalized the best with an average area under the curve (AUC) of 0.96 (±0.01) and 0.83 (±0.03) on the Rochester and Arizona test sets, respectively.ConclusionsLeveraging the proposed generalization techniques, we successfully developed an echocardiography-based deep learning model that can accurately differentiate CP from CA and normal cases and applied the model to images from two sites. The proposed framework can be further extended for the development of echocardiography-based deep learning models.
The use of artificial intelligence (AI) in healthcare has become a very active research area in the last few years. While significant progress has been made in image classification tasks, only a few AI methods are actually deployed in clinical settings. A major hurdle in actively using clinical AI models is the trustworthiness of these models. Often, these complex models are utilized as black boxes in which promising results are generated. However, when scrutinized, these models reveal implicit biases during decision-making, such as having an unintended bias towards particular ethnic groups and sub-populations. In our study, we develop a two-step adversarial debiasing approach with partial learning that can reduce the disparity while preserving the performance of the targeted diagnosis/classification task. The methodology has been evaluated on two independent medical image case studies - chest X-rays and mammograms and showed promises in bias reduction while preserving the targeted performance on both internal and external datasets.
Purpose: In recent years, the development and exploration of deeper and more complex deep learning models has been on the rise. However, the availability of large heterogeneous datasets to support efficient training of deep learning models is lacking. While linear image transformations for augmentation have been used traditionally, the recent development of generative adversarial networks (GANs) could theoretically allow us to generate an infinite amount of data from the real distribution to support deep learning model training. Recently, the Radiological Society of North America (RSNA) curated a multiclass hemorrhage detection challenge dataset that includes over 800,000 images for hemorrhage detection, but all high-performing models were trained using traditional data augmentation techniques. Given a wide variety of selections, the augmentation for image classification often follows a trial-and-error policy.
Approach: We designed conditional DCGAN (cDCGAN) and in parallel trained multiple popular GAN models to use as online augmentations and compared them to traditional augmentation methods for the hemorrhage case study.
Results: Our experimentations show that the super-minority, epidural hemorrhages with cDCGAN augmentation presented a minimum of 2 × improvement in their performance against the traditionally augmented model using the same classifier configuration.
Conclusion: This shows that for complex and imbalanced datasets, traditional data imbalancing solutions may not be sufficient and require more complex and diverse data augmentation methods such as GANs to solve.
Purpose: To differentiate oncocytoma and chromophobe renal cell carcinoma (RCC) using radiomics features computed from spherical samples of image regions of interest, “radiomic biopsies” (RBs).
Approach: In a retrospective cohort study of 102 CT cases [68 males (67%), 34 females (33%); mean age ± SD, 63 ± 12 years], we pathology-confirmed 42 oncocytomas (41%) and 60 chromophobes (59%). A board-certified radiologist performed two RB rounds. From each RB round, we computed radiomics features and compared the performance of a random forest and AdaBoost binary classifier trained from the features. To control for overfitting, we performed 10 rounds of 70% to 30% train-test splits with feature-selection, cross-validation, and hyperparameter-optimization on each split. We evaluated the performance with test ROC AUC. We tested models on data from the other RB round and compared with the same round testing with the DeLong test. We clustered important features for each round and measured a bootstrapped adjusted Rand index agreement.
Results: Our best classifiers achieved an average AUC of 0.71 ± 0.024. We found no evidence of an effect for RB round (p = 1). We also found no evidence for a decrease in model performance when tested on the other RB round (p = 0.85). Feature clustering produced seven clusters in each RB round with high agreement (Rand index = 0.981 ± 0.002, p < 0.00001).
Conclusions: A consistent radiomic signature can be derived from RBs and could help distinguish oncocytoma and chromophobe RCC.
By taking advantages of both mammography and breast MRI, contrast-enhanced digital mammography (CEDM) has emerged as a new promising imaging modality to improve efficacy of breast cancer screening and diagnosis. The primary objective of study is to develop and evaluate a new computer-aided detection and diagnosis (CAD) scheme of CEDM images to classify between malignant and benign breast masses. A CEDM dataset consisting of 111 patients (33 benign and 78 malignant) was retrospectively assembled. Each case includes two types of images namely, low-energy (LE) and dual-energy subtracted (DES) images. First, CAD scheme applied a hybrid segmentation method to automatically segment masses depicting on LE and DES images separately. Optimal segmentation results from DES images were also mapped to LE images and vice versa. Next, a set of 109 quantitative image features related to mass shape and density heterogeneity was initially computed. Last, four multilayer perceptron-based machine learning classifiers integrated with correlationbased feature subset evaluator and leave-one-case-out cross-validation method was built to classify mass regions depicting on LE and DES images, respectively. Initially, when CAD scheme was applied to original segmentation of DES and LE images, the areas under ROC curves were 0.7585±0.0526 and 0.7534±0.0470, respectively. After optimal segmentation mapping from DES to LE images, AUC value of CAD scheme significantly increased to 0.8477±0.0376 (p<0.01). Since DES images eliminate overlapping effect of dense breast tissue on lesions, segmentation accuracy was significantly improved as compared to regular mammograms, the study demonstrated that computer-aided classification of breast masses using CEDM images yielded higher performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.