KEYWORDS: Image segmentation, Medical imaging, Kidney, Ultrasonography, Monte Carlo methods, Performance modeling, Image enhancement, Data modeling, Reliability, Uncertainty analysis
The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM’s performance in medical image segmentation. The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy. We evaluate the method on two ultrasound datasets and show improvement in SAM’s performance and robustness to inaccurate prompts, without the necessity for further training or tuning. Moreover, we present the Single-Slice-to-Volume (SS2V) method, enabling 3D pixel-level segmentation using only the bounding box annotation from a single 2D slice. Our results allow efficient use of SAM in even noisy, low-contrast medical images. The source code has been released at: https://github.com/MedICL-VU/FNPC-SAM
KEYWORDS: Computed tomography, Image segmentation, Transformers, Windows, Medical imaging, 3D image processing, 3D modeling, Prostate, Magnetic resonance imaging, Data modeling
Convolutional Neural Networks (CNNs) exhibit strong performance in medical image segmentation tasks by capturing high-level (local) information, such as edges and textures. However, due to the limited field of view of convolution kernels, it is hard for CNNs to fully represent global information. Recently, transformers have shown good performance for medical image segmentation due to their ability to better model long-range dependencies. Nevertheless, transformers struggle to capture high-level spatial features as effectively as CNNs. A good segmentation model should learn a better representation from local and global features to be both precise and semantically accurate. In our previous work, we proposed CATS, which is a U-shaped segmentation network augmented with transformer encoder. In this work, we further extend this model and propose CATS v2 with hybrid encoders. Specifically, hybrid encoders consist of a CNN-based encoder path paralleled to a transformer path with a shifted window, which better leverage both local and global information to produce robust 3D medical image segmentation. We fuse the information from the convolutional encoder and the transformer at the skip connections of different resolutions to form the final segmentation. The proposed method is evaluated on three public challenge datasets: Beyond the Cranial Vault (BTCV), Cross-Modality Domain Adaptation (CrossMoDA) and task 5 of Medical Segmentation Decathlon (MSD-5), to segment abdominal organs, vestibular schwannoma (VS) and prostate, respectively. Compared with the state-of-the-art methods, our approach demonstrates superior performance in terms of higher Dice scores. Our code is publicly available at https://github.com/MedICL-VU/CATS.
Automated segmentation of multiple sclerosis (MS) lesions from MRI scans is important to quantify disease progression. In recent years, convolutional neural networks (CNNs) have shown top performance for this task when a large amount of labeled data is available. However, the accuracy of CNNs suffers when dealing with few and/or sparsely labeled datasets. A potential solution is to leverage the information available in large public datasets in conjunction with a target dataset which only has limited labeled data. In this paper, we propose a training framework, SSL2 (self-supervised-semi-supervised), for multi-modality MS lesion segmentation with limited supervision. We adopt self-supervised learning to leverage the knowledge from large public 3T datasets to tackle the limitations of a small 7T target dataset. To leverage the information from unlabeled 7T data, we also evaluate state-of-the-art semi-supervised methods for other limited annotation settings, such as small labeled training size and sparse annotations. We use the shifted-window (Swin) transformer1 as our backbone network. The effectiveness of self-supervised and semi-supervised training strategies is evaluated in our in-house 7T MRI dataset. The results indicate that each strategy improves lesion segmentation for both limited training data size and for sparse labeling scenarios. The combined overall framework further improves the performance substantially compared to either of its components alone. Our proposed framework thus provides a promising solution for future data/label-hungry 7T MS studies.
KEYWORDS: Image segmentation, Medical imaging, Education and training, Content addressable memory, Control systems, Magnetic resonance imaging, Machine learning, Data modeling, White matter, Biomedical applications
Medical image harmonization aims to transform the image ‘style’ among heterogeneous datasets while preserving the anatomical content. It enables data-sensitive learning-based approaches to fully leverage the data power of large multi-site datasets with different image acquisitions. Recently, the attention mechanism has achieved excellent performance on the image-to-image (I2I) translation of natural images. In this work, we further explore the potential of leveraging the attention mechanism to improve the performance of medical image harmonization. Here, we introduce two attention-based frameworks with outstanding performance in the natural I2I scenario in the context of cross-scanner MRI harmonization for the first time. We compare them with the existing commonly used harmonization frameworks by evaluating their ability to enhance the performance of the downstream subcortical segmentation task on T1-weighted (T1w) MRI datasets from 1.5T vs. 3T scanners. Both qualitative and quantitative results prove that the attention mechanism contributes to a noticeable improvement in harmonization ability.
KEYWORDS: Image segmentation, Angiography, Education and training, Principal component analysis, Photography, Data modeling, Performance modeling, Visualization, Feature extraction, Deep learning
Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.
Brain extraction, also known as skull stripping, from magnetic resonance images (MRIs) is an essential preprocessing step for many medical image analysis tasks and is also useful as a stand-alone task for estimating the total brain volume. Currently, many proposed methods have excellent performance on T1-weighted images, especially for healthy adults. However, such methods do not always generalize well to more challenging datasets such as pediatric, severely pathological, or heterogeneous data. In this paper, we propose an automatic deep learning framework for brain extraction on T1-weighted MRIs of adult healthy controls, Huntington’s disease patients and pediatric Aicardi Gouti`eres Syndrome (AGS) patients. We examine our method on the PREDICT-HD and the AGS datasets, which are multi-site datasets with different protocols/scanners. Compared to current state-of-the-art methods, our method produced the best segmentations with the highest Dice score, lowest average surface distance and lowest 95-percent Hausdorff distance on both datasets. These results indicate that our method has better accuracy and generalizability for heterogeneous T1-w MRI datasets.
Optical coherence tomography (OCT) is a prevalent non-invasive imaging method which provides high resolution volumetric visualization of retina. However, its inherent defect, the speckle noise, can seriously deteriorate the tissue visibility in OCT. Deep learning based approaches have been widely used for image restoration, but most of these require a noise-free reference image for supervision. In this study, we present a diffusion probabilistic model that is fully unsupervised to learn from noise instead of signal. A diffusion process is defined by adding a sequence of Gaussian noise to self-fused OCT b-scans. Then the reverse process of diffusion, modeled by a Markov chain, provides an adjustable level of denoising. Our experiment results demonstrate that our method can significantly improve the image quality with a simple working pipeline and a small amount of training data. The implementation is available at https://github.com/DeweiHu/OCT_DDPM.
We present a deep-learning based approach for automated qualitative assessment of lesion volumes using OCT images to enable real-time assessment of injury severity and longitudinal tracking of tissues response to photodamage. The network has been trained to quantify photodamage between the outer plexiform layer (OPL) and retinal pigmented epithelium (RPE) accurately without the need for extensive image pre- and post-processing. Manually annotated OCT cross-sections were used as ground-truths to train a U-Net convolutional neural network. The network was designed and implemented in PyTorch based on the multi-scale U-Net architecture.
Ophthalmic OCT image-quality is highly variable and directly impacts clinical diagnosis of disease. Computational methods such as frame-averaging, filtering, deep-learning approaches are generally constrained by either extended imaging times when acquiring repeated-frames, over-smoothing and loss of features, or the need for extensive training sets. Self-fusion is a robust OCT image-enhancement method that overcomes these aforementioned limitations by averaging serial OCT frames weighted by their respective similarity. Here, we demonstrated video-rate self-fusion using a convolutional neural network. Our experimental results show a near doubling of OCT contrast-to-noise ratio at a frame-rate of ~22 fps when integrated with custom OCT acquisition software.
Intraoperative optical coherence tomography (iOCT) has enabled depth-resolved intraoperative imaging of retinal microstructures. Despite recent advancements, iOCT of surgical maneuvers remains challenging because the imaging field-of-view requires manual adjustment and tracking. To overcome this limitation, we previously demonstrated spectrally encoded coherence tomography and reflectometry (SECTR), which provides OCT imaging and a complementary en face view for visualization of surgical instruments and tool-tracking. Here, we demonstrate ophthalmic imaging with an intraoperative SECTR (iSECTR) system integrated with a surgical microscope. We believe that iSECTR will allow for real-time feedback on the location and depth of surgical instruments to better guide ophthalmic surgery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.