This will count as one of your downloads.
You will have access to both the presentation and article (if available).
Global mammographic radiomic signature can predict radiologists’ difficult-to-interpret normal cases
Eye tracking data obtained from 8 radiologists (of varying experience levels in reading mammograms) reviewing 120 two-view digital mammography cases (59 cancers) have been used to train the model, which was pre-trained with the ImageNet dataset for transfer learning. Areas of the mammogram that received direct (foveally fixated), indirect (peripherally fixated) or no (never fixated) visual attention were extracted from radiologists’ visual search maps (obtained by a head mounted eye tracking device). These areas, along with the radiologists’ assessment (including confidence of the assessment) of suspected malignancy were used to model: 1) Radiologists’ decision; 2) Radiologists’ confidence on such decision; and 3) The attentional level (i.e. foveal, peripheral or none) obtained by an area of the mammogram. Our results indicate high accuracy and low misclassification in modelling such behaviours.
Materials and Methods: 15 breast readers were asked to interpret a test set of 29 normal screening mammogram cases and classify them by rating the difficulty of the case on a five-point Likert scale, identifying the salient features and assessing their suitability for single reading. Using the False Positive Fractions from a previous study, the 29 cases were classified into 10 "low", 10 "medium" and nine "high" difficulties. Data was analyzed with descriptive statistics. Spearman's correlation was used to test the strength of association between the difficulty of the cases and the readers’ recommendation for single reading strategy.
Results: The ratings from readers in this study corresponded to the known difficulty level of cases for the 'low' and 'high' difficulty cases. Uniform ductal pattern and density, symmetrical mammographic features and the absence of micro-calcifications were the main reasons associated with 'low' difficulty cases. The 'high' difficulty cases were described as having ‘dense breasts’. There was a statistically significant negative correlation between the difficulty of the cases and readers’ recommendation for single reading (r = -0.475, P = 0.009).
Conclusion: The findings demonstrated potential relationships between certain mammographic features and the difficulty for readers to classify mammograms as 'normal'. The standard Australian practice of double reading was deemed more suitable for most cases. There was an inverse moderate association between the difficulty of the cases and the recommendations for single reading.
Methods: A total of 60 cases were presented to the readers, of which 20 contained cancers and 40 showed no abnormality. Each case comprised of four images and 129 breast readers participated in the study. Each reader was asked to identify and locate any malignancies using a 1-5 confidence scale. All images were displayed using 5MP monitors, supported by radiology workstations with full image manipulation capabilities. A jack-knife free-response receiver operating characteristic, figure of merit (JAFROC, FOM) methodology was employed to assess reader performance. Details were obtained from each reader regarding their experience, qualifications and breast reading activities. Spearman and Mann Whitney U techniques were used for statistical analysis.
Results: Higher performance was positively related to numbers of years professionally qualified (r= 0.18; P<0.05), number of years reading breast images (r= 0.24; P<0.01), number of mammography images read per year (r= 0.28; P<0.001) and number of hours reading mammographic images per week (r= 0.19; P<0.04). Unexpectedly, higher performance was inversely linked to previous experience with digital images (r= - 0.17; p<0.05) and further analysis, demonstrated that this finding was due to changes in specificity.
Conclusion: This study suggests suggestion that readers with experience in digital images reporting may exhibit a reduced ability to correctly identify normal appearances requires further investigation. Higher performance is linked to number of cases read per year.
To compare radiologists’ confidence in assessing breast cancer using combined digital mammography (DM) and digital breast tomosynthesis (DBT) compared with DM alone as a function of previous experience with DBT.
Materials and Methods
Institutional ethics approval was obtained. Twenty-three experienced breast radiologists reviewed 50 cases in two modes, DM alone and DM+DBT. Twenty-seven cases presented with breast cancer. Each radiologist was asked to detect breast lesions and give a confidence score of 1-5 (1- Normal, 2- Benign, 3- Equivocal, 4- Suspicious, 5- Malignant). Radiologists were divided into three sub-groups according to their prior experience with DBT (none, workshop experience, and clinical experience). Confidence scores using DM+DBT were compared with DM alone for all readers combined and for each DBT experience subgroup. Statistical analyses, using GraphPad Prism 5, were carried out using the Wilcoxon signed-rank test with statistical significance set at p< 0.05.
Results
Confidence scores were higher for true positive cancer cases using DM+DBT compared with DM alone for all readers (p < 0.0001). Confidence scores for normal cases were lower (indicating greater confidence in the non-cancer diagnosis) with DM+DBT compared with DM alone for all readers (p= 0.018) and readers with no prior DBT experience (p= 0.035).
Conclusion
Addition of DBT to DM increases the confidence level of radiologists in scoring cancer and normal/benign cases. This finding appears to apply across radiologists with varying levels of DBT experience, however further work involving greater numbers of radiologists is required.
Materials and Methods: An observer performance and eye position analysis study was performed. Four expert breast radiologists were asked to interpret two sets of 40 screening mammograms. The Control Set contained 36 normal and 4 malignant cases (located at case # 9, 14, 25 and 37). The Primed Set contained the same 34 normal and 4 malignant cases (in the same location) plus 2 “primer” malignant cases replacing 2 normal cases (located at positions #20 and 34). Primer cases were defined as lower difficulty cases containing salient malignant features inserted before cases of greater difficulty.
Results: Wilcoxon Signed Rank Test indicated no significant differences in sensitivity or specificity between the two sets (P > 0.05). The fixation count in the malignant cases (#25, 37) in the Primed Set after viewing the primer cases (#20, 34) decreased significantly (Z = -2.330, P = 0.020). False-Negatives errors were mostly due to sampling in the Primed Set (75%) in contrast to in the Control Set (25%).
Conclusion: The overall performance of radiologists is not affected by the inclusion of obvious cancer cases. However, changes in visual search behavior, as measured by eye-position recording, suggests visual disturbance by the inclusion of priming cases in screening mammography.
Materials and Methods: Twenty six experienced radiologists who specialized in breast imaging read 50 cases (27 cancers and 23 non-cancer cases) of patients who underwent DM and DBT. Both exams included the craniocaudal (CC) and mediolateral oblique (MLO) views. Histopathologic examination established truth in all lesions. Each case was interpreted in two modes, once with DM alone followed by DM+DBT, and the observers were asked to mark the location of any lesions, if present, and give it a score based on a five-category assessment by the Royal Australian and New Zealand College of Radiologists (RANZCR). The diagnostic performance of DM compared with that of DM+DBT was evaluated in terms of the difference between areas under receiver-operating characteristic curves (AUCs), Jackknife free-response receiver operator characteristics (JAFROC) figure-of-merit, sensitivity, location sensitivity and specificity.
Results: Average AUC and JAFROC for DM versus DM+DBT was significantly different (AUCs 0.690 vs 0.781, p=< 0.0001), (JAFROC 0.618 vs. 0.732, p=< 0.0001) respectively. In addition, the use of DM+DBT resulted in an improvement in sensitivity (0.629 vs. 0.701, p=0.0011), location sensitivity (0.548 vs. 0.690, p=< 0.0001) and specificity (0.656 vs. 0.758, p=0.0015) when compared to DM alone.
Conclusion: Adding DBT to the standard DM significantly improved radiologists’ performance in terms of AUCs, JAFROC figure of merit, sensitivity, location sensitivity and specificity values.
Background: Although the UK and Australia national breast screening programs have regarded PERFORMS and BREAST test-set strategies as possible methods of estimating readers' clinical efficacy, the relationship between test-set and real life performance results has never been satisfactorily understood.
Methods: Forty-one radiologists from BreastScreen New South Wales participated in this study. Each reader interpreted a BREAST test-set which comprised sixty de-identified mammographic examinations sourced from the BreastScreen Digital Imaging Library. Spearman's rank correlation coefficient was used to compare the sensitivity measured from the BREAST test-set with screen readers' clinical audit data.
Results: Results shown statistically significant positive moderate correlations between test-set sensitivity and each of the following metrics: rate of invasive cancer per 10 000 reads (r=0.495; p < 0.01); rate of small invasive cancer per 10 000 reads (r=0.546; p < 0.001); detection rate of all invasive cancers and DCIS per 10 000 reads (r=0.444; p < 0.01).
Conclusion: Comparison between sensitivity measured from the BREAST test-set and real life detection rate demonstrated statistically significant positive moderate correlations which validated that such test-set strategies can reflect readers' clinical performance and be used as a quality assurance tool. The strength of correlation demonstrated in this study was higher than previously found by others.
Methods and materials: A total of 129 readers independently reviewed 60 mammographic cases, 20 of which were biopsy proven cases (abnormal) and 40 were normal. Each case consisted of the four standard cranio-caudal (CC) and medio-lateral oblique (MLO) projections. Readers were asked to interpret and locate any presence of cancer, and levels of confidence were scored on a scale of 1-5. Radiology workstations supporting 5MP diagnostic monitors and with full image manipulation tools were used to display all images. JAFROC and ROC methodologies were used and figures of merit and Az values respectively were correlated against key reader characteristics such as experience, qualifications, breast reading practices and physical characteristics using Spearman techniques.
Results: Correlation analysis between reader characteristics and JAFROC analysis demonstrated that four key characteristics were linked to performance: years of qualification as a radiologist (p=0.05, r= 0.18), years reading mammograms (p=0.01, r=0.24), number of mammograms read per year (p=0.001, r=0.24), and hours reading mammogram per week (p=0.04, r= 0.19). The ROC method indicated that determinants of performance were confined to years reading mammograms (p=0.02, r = 0.2), and number of mammograms read per year (p=0.04, r=0.23).
Conclusion: This work demonstrates the practical impact on study conclusions when different methodologies are used. The location sensitivity approach employed and statistical power with JAFROC, would suggest that the findings from this approach should be prioritized.
Background: The performance of screen readers in detecting breast cancer is being assessed in some countries by using mammographic test sets. However, previous studies have provided little evidence that performance assessed by test sets strongly correlate to performance in clinical reading.
Methods: Five clinicians from BreastScreen New South Wales participated in this study. Each clinician was asked to read 200 de-identified mammographic examinations gathered from their own case history within the BreastScreen NSW Digital Imaging Library. All test sets were designed with specific proportions of true positive, true negative, false positive and false negative examinations from the previous actual clinical reads of each reader. A prior mammogram examination for comparison (when available) was also provided for each case.
Results: Preliminary analyses have shown that there is a moderate level of agreement (Kappa 0.42−0.56, p < 0.001) between laboratory test sets and actual clinical reading. In addition, a mean increase of 38% in sensitivity in the laboratory test sets as compared to their actual clinical readings was demonstrated. Specificity is similar between the laboratory test sets and actual clinical reading.
Conclusion: This study demonstrated a moderate level of agreement between actual clinical reading and test set reading, which suggests that test sets have a role in reflecting clinical performance.
Materials and Methods: Twenty-four radiologists viewed 40 frontal chest radiographs and gave their opinion as to the position of a central venous catheter. One-to-three days later they again viewed 40 frontal chest radiographs and again gave their opinion as to the position of the central venous catheter. Half of the radiographs in the second set were repeated images from the first set and half were new. The radiologists were asked of each image whether it had been included in the first set. For this study, we are evaluating only the 20 repeated images. We used the Kruskal-Wallis test and Fisher's exact test to determine the relationship between conscious recognition of a previously interpreted image and consistency in interpretation of the image.
Results. There was no significant correlation between recognition of the image and consistency in response regarding the position of the central venous catheter. In fact, there was a trend in the opposite direction, with radiologists being slightly more likely to give a consistent response with respect to images they did not recognize than with respect to those they did recognize.
Conclusion: Radiologists' recognition of previously-encountered images in an observer-performance study does not noticeably color their interpretation on the second encounter.
View contact details
No SPIE Account? Create one