We are developing five-year survival prediction models for bladder cancer patients who underwent neoadjuvant chemotherapy and radical cystectomy. This study investigated the feasibility of using large language models (Vicuna and Dolly) to extract clinical descriptors from reports for survival prediction with a nomogram model, and with or without further combining with radiomics and deep-learning descriptors from CTU images using BPNNs. The models were developed and validated using data of 163 patients collected with IRB approval. The developed models included C (based on clinical descriptors and nomogram), R (radiomics descriptors), D (deep-learning descriptor), CR (clinical and radiomics descriptors), CD (clinical and deep-learning descriptors), and CRD (clinical, radiomics, and deep-learning descriptors). The developed models achieved the following AUCs on test set: 0.82±0.06 (C: manually labeled reference), 0.73±0.07 (R), and 0.71±0.07 (D), 0.80±0.06 (C: User1 Vicuna-C2 labeled), 0.83±0.05 (C: User1 Dolly labeled), 0.78±0.06 (C: User2 Vicuna-C2 labeled), and 0.85±0.05 (C: User2 Dolly-C2 labeled). For the combined models, the AUCs were (1) manually labeled reference: 0.86±0.05 (CR), 0.86±0.05 (CD), and 0.87±0.05 (CRD), (2) CRD performance on Vicuna-C2 labeled: 0.86±0.05 (User1) and 0.84±0.05 (User2); (3) CRD performance on Dolly-C2 labeled: 0.88±0.05 (User1) and 0.89±0.04 (User2). The results showed that the LLMs extracted three clinical descriptors with accuracy ranging from 77% to 100% relative to manual extraction, and the LLMs run by two users had similar performance. The combined models outperformed individual models, and using LLM-extracted clinical descriptors achieved similar performance as manually extracted descriptors.
We are developing deep-learning convolutional neural network (DL-CNN) and radiomics models to assist physicians in treatment response assessment in CT urography of bladder cancer to neoadjuvant chemotherapy (NAC). We collected a total of 264 pre- and post-treatment lesion pairs of 227 patients from University of Michigan hospital with IRB approval. The data were split into 3 sets by case: a training set including 35 complete responders (CRs) (T0 stage after treatment) and 113 non-complete responders (NCRs) (< T0 stage after treatment), a validation set including 5 CRs and 5 NCRs, and an independent test set including 19 CRs and 87 NCRs. The training set was used to train the models to classify CRs and NCRs and the selection of optimal models was guided by the validation set. The selected models were deployed on the test set to generate the likelihood score of CR of each pair. The classifying performance was evaluated by the area under ROC curve (AUC). Hybrid ROIs extracted from the lesions in the pre- and post- treatment scan pairs were used as input to the DL-CNN model. The optimal DL-CNN model achieved an AUC of 0.75 ± 0.06 on the test set. For the radiomics model, the random forest classifier was applied to the features extracted from the pre- and post-treatment lesions. The optimal radiomics model achieved an AUC of 0.76 ± 0.05. A combined DL-CNN and radiomics model increased the AUC to 0.77 ± 0.06. The results indicated the feasibility of using the DL-CNN model and radiomics model for assessing treatment response of bladder cancer.
We have previously developed a computerized decision support system for bladder cancer treatment response assessment (CDSS-T) in CT urography (CTU). In this work, we conducted an observer study to evaluate the diagnostic accuracy and intra-observer variability with and without the CDSS-T system. One hundred fifty-seven pre- and posttreatment lesion pairs were identified in pre- and post- chemotherapy CTU scans of 123 patients. Forty lesion pairs had T0 stage (complete response) after chemotherapy. Multi-disciplinary observers from 4 different institutions participated in reading the lesion pairs, including 5 abdominal radiologists, 4 radiology residents, 5 oncologists, 1 urologist, and 1 medical student. Each observer provided estimates of the T0 likelihood after treatment without and then with the CDSST aid for each lesion. To assess the intra-observer variability, 51 cases were evaluated two times – the original and the repeated evaluation. The average area under the curve (AUC) of 16 observers for estimation of T0 disease after treatment increased from 0.73 without CDSS-T to 0.77 with CDSS-T (p = 0.003). For the evaluation with CDSS-T, the average AUC performance for different institutions was similar. The performance with CDSS-T was improved significantly and the AUC standard deviations were slightly smaller showing potential trend of more accurate and uniform performance with CDSS-T. There was no significant difference between the original and repeated evaluation. This study demonstrated that our CDSS-T system has the potential to improve treatment response assessment of physicians from different specialties and institutions, and reduce the inter- and intra-observer variabilities of the assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.