Precise segmentation of rectal cancer tumors on routine MRI is critical for accurate clinical staging and downstream computational analyses. While deep learning-based segmentation algorithms have shown much promise in automating the otherwise tedious, subjective, and costly process of manual segmentation, they require significant amounts of manually annotated data for training. To address these limitations of deep learning-based segmentation models, we present a novel deep learning framework that incorporates human-in-the-loop (HITL) refinement to automatically delineate rectal tumors on multi-plane pre-treatment MR imaging. When evaluated on multiple holdout validation cohorts including a clinical trial dataset, the post-HITL segmentation model significantly outperformed the pre-HITL model with median dice similarity coefficient of 0.763 and Hausdorff distance of 28.4mm in comparison to 0.601 and 31.8mm, respectively. HITL refinement learning also significantly accelerated the manual annotation process by 20 minutes. HITL learning represents a feasible, effective, and efficient solution to semi-automated tumor segmentation on routine rectal cancer MRI scans.
Radiomic analysis has shown significant potential for predicting treatment response to neoadjuvant therapy in rectal cancers via routine MRI, though primarily based off a single acquisition plane or single region of interest. To exploit intuitive clinical and biological aspects of tumor extent on MRI, we present a novel multi-plane, multi-region radiomics framework to more comprehensively characterize and interrogate treatment response on MRI. Our framework was evaluated on a cohort of 71 T2-weighted axial and coronal MRIs from patients diagnosed with rectal cancer and who underwent chemoradiation. 2D radiomic features were extracted from three regions of interest (tumor, fat proximal to tumor, and perirectal fat) across axial and coronal planes, with a two-stage feature selection scheme designed to identify descriptors associated with pathologic complete response. When evaluated via a quadratic discriminant analysis classifier, our multi-plane, multi-region radiomics model outperformed single-plane or single-region feature sets with an area under the ROC curve (AUC) of 0.80 ± 0.03 in discovery and AUC=0.65 in hold-out validation. Uniquely, the optimal feature set comprised descriptors from across multiple planes (axial, coronal) as well as multiple regions (tumor, proximal fat, perirectal fat). Our multi-plane, multi-region radiomics framework may thus enable more comprehensive phenotyping of treatment response on MRI, potentially finding application for improved personalization of therapeutic and surgical interventions in rectal cancers.
With increasing promise of radiomics and deep learning approaches in capturing subtle patterns associated with disease response on routine MRI, there is an opportunity to more closely combine components from both approaches within a single architecture. We present a novel approach to integrating multi-scale, multi-oriented wavelet networks (WN) into a convolutional neural network (CNN) architecture, termed a deep hybrid convolutional wavelet network (DHCWN). The proposed model comprises the wavelet neurons (wavelons) that use the shift and scale parameters of a mother wavelet function as its building units. Whereas the activation functions in a typical CNN are fixed and monotonic (e.g. ReLU), the activation functions of the proposed DHCWN are wavelet functions that are flexible and significantly more stable during optimization. The proposed DHCWN was evaluated using a multi-institutional cohort of 153 pre-treatment rectal cancer MRI scans to predict pathologic response to neoadjuvant chemoradiation. When compared to typical CNN and a multilayer wavelet perceptron (DWN-MLP) 2D and 3D architectures, our novel DHCWN yielded significantly better performance in predicting pathologic complete response (achieving a maximum accuracy of 91.23% and a maximum AUC of 0.79), across multi-institutional discovery and hold-out validation cohorts. Interpretability evaluation of all three architectures via Grad-CAM and Shapley visualizations revealed DHCWNs best captured complex texture patterns within tumor regions on MRI as associated with pathologic complete response classification. The proposed DHCWN thus offers a significantly more extensible, interpretable, and integrated solution for characterizing predictive signatures via routine imaging data.
Deep learning based convolutional neural networks (CNNs) for prostate cancer (PCa) risk stratification employ radiologist delineated regions of interest (ROIs) on MRI. These ROIs contain the reader’s interpretation of the region of PCa. Variations in reader annotations change the features that are extracted from the ROIs, which may in turn affect classification performance of CNNs. In this study, we sought to analyze the effect of variations in inter-reader delineations of PCa ROIs on training of CNNs with regards to distinguishing clinically significant (csPCa) and insignificant PCa (ciPCa). We employed 180 patient studies (n=274 lesions) from 3 cohorts who underwent 3T multi-parametric MRI followed by MRI-targeted biopsy and/or radical prostatectomy. ISUP Gleason grade groups (GGG) obtained from pathology were used to determine csPCa (GGG≥2) and ciPCa (GGG=1). 5 experienced radiologists, with over 5 years of experience in prostate imaging, delineated PCa ROIs on bi-parametric MRI (bpMRI including T2 weighted (T2W) and diffusion weighted (DWI) sequences) within the training set (n1=160 lesions) using targeted biopsy locations. Patches were extracted using the ROIs which were then used to train individual CNNs (N1-N5) using the SqueezeNet architecture. The average volume for readerdelineated ROIs used for training varied greatly, ranging between 1106 and 2107 mm across all readers. The resulting networks showed no significant difference in classification performance (AUC= 0.82 ± 0.02) indicating that they were relatively robust to inter-reader variations in ROI. These models were evaluated on independent test sets (n2=85 lesions, n3=29 lesions) where ROIs were obtained by co-registration of MRI with post-surgical pathology, unaffected by inter-reader variations in ROIs. Network performance across D2 and D3 was 0.80±0.02 and 0.62 ± 0.03, respectively. The CNN predictions were moderately consistent, with ICC(2,1) scores across D2 and D3 being 0.74 and 0.54, respectively. Higher agreement in ROI overlap produced higher correlation in predictions on external test sets (R = 0.89, p < 0.05). Furthermore, higher average ROI volume produced greater AUC scores on D3, indicating that comprehensive ROIs may provide more features for DL networks to use in classification. Inter-reader variations in ROIs on MRI may influence the reliability and generalizability of CNNs trained for PCa risk stratification.
Bi-parametric MRI (bpMRI: T2W MRI and Apparent Diffusion Coefficient maps (ADC) derived from diffusion weighted imaging) is increasingly being used to characterize prostate cancer (PCa). However, inter- and intrareader variability hinders interpretation of MRI. Deep learning networks may aid in PCa characterization and may allow for non-invasively distinguishing clinically significant (csPCa: GGG<1) and insignificant (ciPCa: GGG=1) PCa. Recent studies have shown that signatures from peri-tumoral (PT) region on imaging add significant value to those from intra-tumoral (IT) region for disease detection and characterization. In this work, we present a multi-sequence multi-instance learning convolutional neural network trained using 2D patches extracted from PCa regions of interest (ROIs) on prostate bpMRI to distinguish csPCa and ciPCa. The trained classifier is used to extract pooled features from both the IT and PT ROIs, which are then used to train a random forest classifier to distinguish csPCa and ciPCa. We train and test our models using patient studies from two different institutions (n=298) with GGG obtained either from post-surgical specimens or biopsies. Model built using IT (DIT) and PT (DPT) deep features alone resulted in an area under the curve (AUC) of 0.83 and 0.73 respectively, while models computed from IT (RIT) and PT (RPT) radiomic features resulted in an AUC of 0.77 and 0.75 respectively. The models DIP and RIP trained on combination of IT and PT deep features and radiomic features resulted in an AUC of 0.86 and 0.80 respectively. In both cases, we observe that combining IT and PT features helps in improving the overall classifier performance in distinguishing csPCa and ciPCa.
Tumor downstaging after neoadjuvant chemoradiation (CRT) in rectal cancer patients is typically assessed via Magnetic Resonance Imaging (MRI) in order to determine follow-up surgical interventions, but is associated with marked inter-reader variability and limited performance. While radiomic features have shown promise for evaluating chemoradiation response and tumor stage in rectal cancers, there is a need to determine how reproducible these features are across different MRI scanners and acquisitions. In this study, we evaluated radiomic feature reproducibility in terms of feature instability within a uniquely curated true healthy" rectum cohort in order to construct a stability-informed radiomic classifier for differentiating poorly from markedly down-staged rectal tumors after chemoradiation in a multi-site setting. We utilized a cohort of 156 patients, with (a) 74 MRIs visualizing the healthy rectum, (b) 52 post-CRT MRI scans in the discovery cohort, and (c) 30 post-CRT MRI scans in a second-site validation cohort; the latter 2 being from rectal cancer patients. 764 radiomic features were extracted from within the entire rectal wall on each MRI scan. Feature instability was used to quantify how reproducible each radiomic feature was between the discovery cohort and the healthy rectum cohort, using locations along the rectum that were spatially distinct from the treated tumor region. From the resulting stability-informed" feature set, the most relevant features were identified to distinguish pathologic tumor stage groups in the discovery cohort via a QDA classifier with cross-validation to ensure robustness. The top 4 radiomic features were then evaluated in hold-out fashion on scans from the validation cohort. We found that utilizing a stability-informed radiomic model (which comprised features that were reproducible in 100% of all comparisons) was significantly more accurate in identifying pathological tumor stage regression in both discovery (AUC=0:66 ± 0:09) and validation (AUC=0.73) cohorts, compared to a basic radiomic model that used all extracted features (AUC=0:60 ± 0:07 in discovery, AUC=0.62 in validation). Evaluating feature instability with respect to healthy rectal tissue may thus enhance the performance of radiomic models in characterizing pathologic downstaging in rectal cancers, via MRI.
Recent advances in the field of radiomics have enabled the development of a number of prognostic and predictive imaging-based tools for a variety of diseases. However, wider clinical adoption of these tools is contingent on their generalizability across multiple sites and scanners. This may be particularly relevant in the context of radiomic features derived from T1- or T2-weighted magnetic resonance images (MRIs), where signal intensity values are known to lack tissue-specific meaning and vary based on differing acquisition protocols between institutions. We present the first empirical study of benchmarking five different radiomic feature families in terms of both reproducibility and discriminability in a multisite setting, specifically, for identifying prostate tumors in the peripheral zone on MRI. Our cohort comprised 147 patient T2-weighted MRI datasets from four different sites, all of which are first preprocessed to correct for acquisition-related artifacts such as bias field, differing voxel resolutions, and intensity drift (nonstandardness). About 406 three-dimensional voxel-wise radiomic features from five different families (gray, Haralick, gradient, Laws, and Gabor) were evaluated in a cross-site setting to determine (a) how reproducible they are within a relatively homogeneous nontumor tissue region and (b) how well they could discriminate tumor regions from nontumor regions. Our results demonstrate that a majority of the popular Haralick features are reproducible in over 99% of all cross-site comparisons, as well as achieve excellent cross-site discriminability (classification accuracy of ≈0.8). By contrast, a majority of Laws features are highly variable across sites (reproducible in <75 % of all cross-site comparisons) as well as resulting in low cross-site classifier accuracies (<0.6), likely due to a large number of noisy filter responses that can be extracted. These trends suggest that only a subset of radiomic features and associated parameters may be both reproducible and discriminable enough for use within machine learning classifier schemes.
Decipher, a genomic test, is used to predict the likelihood of metastasis and prostate cancer (PCa) specific mortality based on expression patterns of 22 RNA markers from radical prostatectomy (RP) specimens. It has been shown to be strongly correlated with metastasis-free prognosis and has been integrated with the National Comprehensive Cancer Network (NCCN) guidelines. However, Decipher is expensive and tissue destructive. Radiomic features refer to the high-throughput computational texture or shape features extracted from radiographic scans. Radiomic features derived from multi-parametric magnetic resonance imaging (mpMRI) of prostate cancer have been shown to be associated with clinically significant PCa. In this study, we sought to evaluate whether radiomic features derived from T2-weighted MRI (T2WI) and apparent diffusion coefficient (ADC) maps of the prostate could distinguish different Decipher risk groups (low, intermediate and high). We also explored correlations between Decipher risk associated radiomic features and features relating to gland morphology on corresponding digitized surgical specimens. A retrospectively acquired, de-identified cohort of 70 PCa patients (N = 74 lesions) who underwent 3T mpMRI prior to RP and Decipher tests after RP were used in this study. The Decipher risk score, ranging from 0 to 1, was used to categorize patients into low/intermediate (D1) and high (D2) risk groups. A multivariate logistic regression model was trained (N = 37 lesions) using radiomic features selected via elastic-net regularization to predict the Decipher risk groups. The model was evaluated on a hold-out test set (N = 37 lesions) and resulted in an area under the receiver operating characteristic curve (AUC) = 0:80. Our model outperformed the prediction using PIRADS v2 (AUC = 0:67), but showed comparable performance with Gleason Grade Group (GGG) (AUC = 0:80). We observed that the best discriminating radiomic features were correlated with gland morphology and gland packing on corresponding histopathology (R = 0.43, p < 0.05).
Evaluating tumor regression of rectal cancers via MRI after standard-of-care chemoradiation therapy (CRT) remains highly challenging for radiologists. While the tumor region-of-interest (ROI) on post-CRT rectal MRI is difficult to localize, an underexplored region is the perirectal fat (surrounding tumor and rectum) where residual cancer cells and positive lymph nodes are known to be present. Recent studies have shown that physiologic environments surrounding tumor regions may provide complementary information that is predictive of response to CRT and patient survival. We present initial results of characterizing perirectal fat regions on MRI via radiomics, towards capturing sub-visual details related to rectal tumor or nodal response to CRT. A total of 37 rectal cancer patients for whom MRIs as well as pathologic tumor staging were available post-CRT were included in this study. Region-wise radiomic features were extracted from expert annotated perirectal fat regions and a 2-stage feature selection was employed to identify the most relevant features. Radiomic entropy of perirectal fat was found to be over-expressed in patients with poor tumor or nodal response post-CRT, albeit with different spatial distributions. In a leave-one-patient-out cross validation setting, a quadratic discriminant analysis (QDA) classifier trained on top radiomic features from the perirectal fat achieved AUCs of 0.77 (for differentiating incomplete vs marked tumor regression) and 0.75 (for differentiating lymph node positive from negative patients). By comparison, perirectal fat intensities achieved significantly poorer AUCs in both tasks. Our results indicate perirectal fat on post-CRT MRI may be highly relevant for evaluating CRT response and informing follow-on interventions in rectal cancers.
The recent advent of radiomics has enabled the development of prognostic and predictive tools which use routine imaging, but a key question that still remains is how reproducible these features may be across multiple sites and scanners. This is especially relevant in the context of MRI data, where signal intensity values lack tissue specific, quantitative meaning, as well as being dependent on acquisition parameters (magnetic field strength, image resolution, type of receiver coil). In this paper we present the first empirical study of the reproducibility of 5 different radiomic feature families in a multi-site setting; specifically, for characterizing prostate MRI appearance. Our cohort comprised 147 patient T2w MRI datasets from 4 different sites, all of which were first pre-processed to correct acquisition-related for artifacts such as bias field, differing voxel resolutions, as well as intensity drift (non-standardness). 406 3D voxel wise radiomic features were extracted and evaluated in a cross-site setting to determine how reproducible they were within a relatively homogeneous non-tumor tissue region; using 2 different measures of reproducibility: Multivariate Coefficient of Variation and Instability Score. Our results demonstrated that Haralick features were most reproducible between all 4 sites. By comparison, Laws features were among the least reproducible between sites, as well as performing highly variably across their entire parameter space. Similarly, the Gabor feature family demonstrated good cross-site reproducibility, but for certain parameter combinations alone. These trends indicate that despite extensive pre-processing, only a subset of radiomic features and associated parameters may be reproducible enough for use within radiomics-based machine learning classifier schemes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.