KEYWORDS: Transformers, Cancer detection, Windows, Prostate cancer, Prostate, Magnetic resonance imaging, Education and training, Principal component analysis, Object detection, Deep learning
Prostate multiparametric magnetic resonance imaging (mpMRI) has demonstrated promising results in prostate cancer (PCa) detection using deep learning models using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large-scale transformers benefit from training with large-scale annotated data, which are expensive and labor-intensive to obtain in medical imaging. Self-supervised learning can effectively leverage unlabeled data to extract useful semantic representations with no additional annotation cost. This can improve model performance on downstream tasks with limited labelled data and increase model robustness to external data. We present a novel end-to-end cross-shaped transformer model CSwin UNet to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI). Using a large prostate bpMRI dataset with 1500 patients, our Cswin UNet achieves 0.880±0.013 AUC and 0.790 ±0.033 pAUC, significantly outperforming state-of-the-art CNN and transformer models.
This study aims to simplify radiation therapy treatment planning by proposing an MRI-to-CT transformer-based denoising diffusion probabilistic model (CT-DDPM) to generate high-quality synthetic computed tomography (sCT) from magnetic resonance imaging (MRI). The goal is to reduce patient radiation dose and setup uncertainty by eliminating the need for CT simulation and image registration during treatment planning. The CT-DDPM utilizes a diffusion process with a shifted-window transformer network to transform MRI into sCT. The model comprises two processes: a forward process, adding Gaussian noise to real CT scans to create noisy images, and a reverse process, denoising the noisy CT scans using a Vshaped network (Vnet) conditioned on the corresponding MRI. With an optimally trained Swin-Vnet, the reverse process generates sCT scans matching the MRI anatomy. The method is evaluated using mean absolute error (MAE) of Hounsfield unit (HU), peak signal-to-noise ratio (PSNR), multi-scale Structural Similarity index (MS-SSIM) and normalized cross-correlation (NCC) between ground truth CTs and sCTs. For the brain dataset, CT-DDPM demonstrated state-of-the-art quantitative results, exhibiting an MAE of 45.210±3.807 HU, a PSNR of 26.753±0.861 dB, an SSIM of 0.964±0.005, and an NCC of 0.981±0.004. In the context of the prostate dataset, the model also showed impressive performance with an MAE of 55.492±8.281 HU, a PSNR of 28.912±2.591 dB, an SSIM of 0.894±0.092, and an NCC of 0.945±0.054. Across both datasets, CT-DDPM significantly outperformed competing networks in most metrics, a finding corroborated by the student’s paired t-test. The source code is available: https://github.com/shaoyanpan/Synthetic-CT-generation-from- MRI-using-3D-transformer-based-denoising-diffusion-model
The purpose of this study is to reduce radiation exposure in PET imaging while preserving high-quality clinical PET images. We propose the PET Consistency Model (PET-CM), an efficient diffusion-model-based approach, to estimate full-dose PET images from low-dose PETs. PET-CM delivers synthetic images of comparable quality to state-of-the-art diffusion-based methods but with significantly higher efficiency. The process involves adding Gaussian noise to full-dose PETs through a forward diffusion process and then using a PET U-shaped network (PET-Unet) for denoising in a reverse diffusion process, conditioned on corresponding low-dose PETs. In experiments denoising one-eighth dose images to full-dose images, PET-CM achieved an MAE of 1.321±0.134%, a PSNR of 33.587±0.674 dB, an SSIM of 0.960±0.008, and an NCC of 0.967±0.011. In scenarios of reducing from 1/4 dose to full dose, PET-CM further showcased its capability with an MAE of 1.123±0.112%, a PSNR of 35.851±0.871 dB, an SSIM of 0.975±0.003, and an NCC of 0.990±0.003.
In this work, we propose MLP-Vnet, a token-based U-shaped multilayer linear perceptron-mixer (MLP-Mixer) network, incorporating a convolutional neural network for multi-structure segmentation on cardiac magnetic resonance imaging (MRI). The proposed MLP-Vnet is composed of an encoder and decoder. Taking an MRI scan as input, the semantic features are extracted by the encoder with one early convolutional block followed by four consecutive MLP-Mixer blocks. Then, the extracted features are passed to the decoder with mirrored architecture of the encoder to form a N-classes segmentation map. We evaluated our proposed network on the Automated Cardiac Diagnosis Challenge (ACDC) dataset. The performance of the network was assessed in terms of the volume- and surface-based similarities between the predicted contours and the manually delineated ground-truth contours, and computational efficiency. The volume-based similarities were measured by the Dice score coefficient (DSC), sensitivity, and precision. The surface-based similarities were measured by Hausdorff distance (HD), mean surface distance (MSD), and residual mean square distance (RMSD). The performance of the MLP-Vnet was compared with four state-of-the-art networks. The proposed network demonstrated statistically superior DSC and superior sensitivity or precision on all the three structures to the competing networks (p-value < 0.05): average DSC of 0.904, sensitivity of 0.908 and precision of 0.902 among all structures. The best surfaceased similarities were also demonstrated by the MLP-Vnet: average HD = 3.266 mm, MSD = 0.684 mm, and RMSD = 1.487 mm. Compared to the competing networks, the MLP-Vnet showed the shortest training time (7.32 hours) inference time per patient (3.12 seconds). The proposed MLP-Vnet is capable of using reasonable number of trainable parameters to solve the segmentation task on the cardiac MRI scans more quickly and accurately than the state-ofthe- art networks. This novel network could be a promising tool for accurate and efficient cardiac MRI segmentation to assist cardiac diagnosis and treatment decision making.
Radiation doses delivered to entire vertebral bodies are current standard practice for the growing pediatric proton craniospinal irradiation (CSI) patients who are growing children. This procedure prevents patients from developing radiation-induced growth impairment, but it will cause hematopoietic marrow suppression. We aim to develop a noninvasive method to verify radiation damage to the marrow in spine vertebrae during fractional treatment using multiple magnetic resonance imaging (MRI) scans. We identified five pediatric patients who received proton CSI treatment with prescription relative biological effectiveness doses of 36 Gy for the spine. Each patient underwent multiple MRI scans during the treatment using T1-weighted sequences. Sagittal MR images were analyzed and focused on lumbar spine regions. Multi-Gaussian models were used to fit histograms from different MR images to quantify the radiation-induced damage to the bone marrow. MR images acquired before the treatment served as the reference to ensure no radiation-induced damage was found. After the treatment started, radiation-induced fatty marrow filtration showed in the vertebral bodies. We defined the radiation-induced damage based on the ratio between fatty marrow imaging pixels and total pixels in spine marrow, L1-L5 level. Damage fractions increased rapidly when the vertebral bodies received doses between 14 Gy and 34 Gy. The maximum damage happened approximately 40 days from the treatment start. After that, bone marrow regeneration was observed, and the damage fractions decreased. The proposed method can potentially achieve adaptative proton plan modification on the fly.
Retinopathy refers to pathologies of the retina that can ultimately result in vision impairment and blindness. Optical Coherence Tomography (OCT) is a technique to image these diseases, aiding in the early detection of retinal damage, which may mitigate the risk of vision loss. In this work, we propose an end-to-end Graph Neural Network (GNN) pipeline that can extract deep graph-based features for multi-class retinopathy classification for the first time. To our knowledge, this is also the first work applying Vision-GNN for OCT image analysis. We trained and tested the proposed GNN on a public OCT retina dataset divided into four categories (Normal, Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), and Drusen). Using our method, we achieve an average accuracy of 99.07% over four classes proving the effectiveness of a deep learning classifier for OCT images with graph-based features. This work lays the foundation to apply GNNs for OCT imaging to aid the early detection of retinal damage.
This work proposes a novel U-shaped neural network, Shifted-window MLP (Swin-MLP), that incorporates a Convolutional Neural Network (CNN) and Multilayer Linear Perceptron-Mixer (MLP-Mixer) for automatic CT multi-organ segmentation. The network has a structure like V-net: 1) a Shifted-window MLP-Mixer encoder learns semantic features from the input CT scans, and 2) a decoder, which mirrors the architecture of the encoder, then reconstructs segmentation maps from the encoder’s features. Novel to the proposed network, we apply a Shifted-window MLP-Mixer rather than convolutional layers to better model both global and local representations of the input scans. We evaluate the proposed network using an institutional pelvic dataset comprising 120 CT scans, and a public abdomen dataset containing 30 scans. The network’s segmentation accuracy is evaluated in two domains: 1) volume-based accuracy is measured by Dice Similarity Coefficient (DSC), segmentation sensitivity, and precision; 2) surface-based accuracy is measured by Hausdorff Distance (HD), Mean Surface Distance (MSD), and Residual Mean Square distance (RMS). The average DSC achieved by MLP-Vnet on the pelvic dataset is 0.866; sensitivity is 0.883, precision is 0.856, HD is 11.523 millimeter (mm), MSD is 3.926 mm, and RMS is 6.262 mm. The average DSC on the public abdomen dataset is 0.903, and HD is 5.275 mm. The proposed MLP-Mixer-Vnet demonstrates significant improvement over CNN-based networks. The automatic multi-organ segmentation tool may potentially facilitate the current radiotherapy treatment planning workflow.
In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution statistics and improve generalization and robustness to noises. AFA-MI augmentation consists of three steps: 1) generate adversarial noises by Fast Gradient Sign Method (FGSM) on the intermediate features of the segmentation network’s encoder; 2) inject the generated adversarial noises into the network, intentionally compromising performance; 3) optimize the network with both clean and adversarial features. The effectiveness of the AFA-MI augmentation was validated on nnUnet. Experiments are conducted segmenting the heart, left and right kidney, liver, left and right lung, spinal cord, and stomach in an institutional dataset collected from 60 patients. We firstly evaluate the AFA-MI augmentation using nnUnet and Token-based Transformer Vnet (TT-Vnet) on the test data from a public abdominal dataset and an institutional dataset. In addition, we validate how AFA-MI affects the networks’ robustness to the noisy data by evaluating the networks with added Gaussian noises of varying magnitudes to the institutional dataset. Network performance is quantitatively evaluated using Dice Similarity Coefficient (DSC) for volume-based accuracy. Also, Hausdorff Distance (HD) is applied for surface-based accuracy. On the public dataset, nnUnet with AFA-MI achieves DSC = 0.85 and HD = 6.16 millimeters (mm); and TT-Vnet achieves DSC = 0.86 and HD = 5.62 mm. On the robustness experiment with the institutional data, AFA-MI is observed to improve the segmentation DSC score ranging from 0.055 to 0.010 across all organs relative to clean inputs. AFA-MI augmentation further improves all contour accuracies up to 0.527 as measured by the DSC score when tested on images with Gaussian noises. AFA-MI augmentation is therefore demonstrated to improve segmentation performance and robustness in CT multi-organ segmentation.
Automatic multi-organ segmentation is a cost-effective tool for generating organ contours using computed tomography (CT) images. This work proposes a deep-learning algorithm for multi-organ (bladder, prostate, rectum, left and right femoral heads) segmentation in pelvic CT images for prostate radiation treatment planning. We propose an encoder-decoder network with a V-net backbone for local feature extraction and contour reconstruction. Novel to our network, we utilize a token-based transformer, which encourages long-range dependency, to forward more informative high-resolution feature maps from the encoder to the decoder. In addition, a knowledge distillation strategy was applied to improve the network’s generalization. We evaluate the network using a dataset collected from 50 patients with prostate cancer. A quantitative evaluation of the proposed network’s performance was performed on each organ based on: 1) volume similarity between the segmented contours and ground truth using Dice score, segmentation sensitivity, precision, and absolute percentage volume difference (AVD), 2) surface similarity evaluated by Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMSD). The performance was then evaluated against other state-of-art methods. The average volume similarities achieved by the network over all organs were: Dice score = 0.83, sensitivity = 0.84, and precision = 0.83; the average surface similarities were HD = 5.77mm, MSD = 0.93mm, RMSD = 2.77mm, and AVD =12.85%. The proposed methods performed significantly better than competing methods in most evaluation metrics. The proposed network may be a promising segmentation approach for use in routine prostate radiation treatment planning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.