Our goal was to ascertain how fatigue affects performance in reading computed tomography (CT) examinations of patients with multiple injuries. CT images with multiple fractures from a previous study of satisfaction of search (SOS) were read by radiologists after a day of clinical work. Performance in this study with fatigued readers was compared to a previous study in which readers were not fatigued. Detection accuracy for obvious injuries was not affected by fatigue, but accuracy for subtle fractures was reduced (P=0.016). An SOS effect on decision thresholds was evident mirroring recent studies. Without fatigue, readers spent more time interpreting and reporting findings as the number of the injuries increased. When fatigued, readers did not increase reading time as fracture number increased. Without fractures, reading time for not-fatigued and fatigued readers was the same (P=0.493) but was significant (P=0.016) with an added subtle fracture. The difference increased with a major injury (P=0.003) and increased further with both a major injury and subtle fracture (P=0.0007). Fatigue and multiple abnormalities have independent effects on detection performance but do interact in determining search time.
Previous studies have demonstrated that fatigue impacts diagnostic accuracy, especially for those in training. We continued this line of investigation to determine if fatigue has any impact on a common source of errors – satisfaction of search (SOS). SOS requires subjects to participate in 2 sessions (SOS and non-SOS) and so does fatigue (fatigued and not fatigued) so we ran subjets in only the fatigued condition and used a previous non-fatigued study as the comparison. We used 64 chest computed radiographs half demonstrating various ‘‘test’’ abnormalities were read twice by 20 radiologists, once with and once without the addition of a simulated pulmonary nodule. Receiver-operating characteristic detection accuracy and decision thresholds were analyzed to study the effects of adding the nodule on detecting the test abnormalities. Adding nodules did not influence detection accuracy (ROC AUC SOS = 0.667; non-SOS = 0.679), but did induce a reluctance to report them. Adding nodules did not affect inspection time so the reluctance to report was not associated with reduced search. Fatigue did not appear to exacerbate the SOS effect. A second study with fractures revealed the same shift in performance but did reduce viewing times when fatigued. The results of these two studies suggest that the impact of fatigue on SOS is more complicated than expected and thus may require more investigation to fully understand its impact in the clinic.
KEYWORDS: Data modeling, Statistical analysis, Interference (communication), Statistical modeling, Contamination, Medical imaging, Information visualization, Error analysis, Electronic filtering, Space operations
Introduction. Perception experiments collecting rating method ROC data sometimes result in operating points at only relatively high specificities for some treatment-reader combinations. In the extreme, no operating points are internal to the feasible space of many parametric models (i.e. for all points, FP = 0). Dorfman & Berbaum1 developed a contaminated binormal model (CBM) to account for ROC data that have few false-positive reports even though many healthy subjects are sampled. Unfortunately, CBM can give very different ROC curve shapes for similar ROC points and when there are no internal operating points, the ROC curve shape will often differ substantially from that obtained when there are internal operating points. Materials and Methods. We eliminate the CBM limiting case by adding a small constant to each cell of the rating data matrix2,3 and to set μ, the difference between the visible signal and noise distributions, to the same high value for all conditions.1 Results. We illustrate the resulting ROC curves using an example dataset from Schartz et al.4 All observed ROC points become internal. The fitted ROC curves are similar to those of the limiting CBM and empirical ROC, but all curves using the same μ have the same shape and never cross. ROC accuracy parameters such area, partial area, and sensitivity at any fixed specificity correspond perfectly. Conclusions. Constraining the CBM to a fixed large μ provides a more effective way to apply it to difficult-to-fit data.
Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected
in radiology examinations. This research includes our study of whether the severity of a detected fracture determines
whether subsequent fractures are overlooked. Each of 70 simulated multitrauma patients presented radiographs of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic
area included a severe fracture (the SOS condition), and when it did not (the control condition). The SOS effect was
measured on detection accuracy for subtle test fractures presented on examinations of the second or third anatomic areas.
SOS reduction in ROC area for detecting subtle test fractures with the addition of a major fracture to the first radiograph
was not observed. The same absence of SOS that had been observed when high-morbidity added fractures were
presented on CT was replicated with the high-morbidity added fractures presented on radiographs. This finding rules out
the possibility that there was no SOS in the prior study with CT because SOS effects do not extend from one imaging
modality to another. Taken together, the evidence rejects the hypothesis that the severity of a detected fracture determines the SOS for subsequently viewed fractures.
Radiologists are reading more cases with more images, especially in CT and MRI and thus working longer hours than
ever before. There have been concerns raised regarding fatigue and whether it impacts diagnostic accuracy. This study
measured the impact of reader visual fatigue by assessing symptoms, visual strain via dark focus of accommodation, and
diagnostic accuracy. Twenty radiologists and 20 radiology residents were given two diagnostic performance tests
searching CT chest sequences for a solitary pulmonary nodule before (rested) and after (tired) a day of clinical reading.
10 cases used free search and navigation, and the other 100 cases used preset scrolling speed and duration. Subjects filled
out the Swedish Occupational Fatigue Inventory (SOFI) and the oculomotor strain subscale of the Simulator Sickness
Questionnaire (SSQ) before each session. Accuracy was measured using ROC techniques. Using Swensson's technique
yields an ROC area = 0.86 rested vs. 0.83 tired, p (one-tailed) = 0.09. Using Swensson's LROC technique yields an area
= 0.73 rested vs. 0.66 tired, p (one-tailed) = 0.09. Using Swensson's Loc Accuracy technique yields an area = 0.77 rested
vs. 0.72 tired, p (one-tailed) = 0.13). Subjective measures of fatigue increased significantly from early to late reading. To
date, the results support our findings with static images and detection of bone fractures. Radiologists at the end of a long
work day experience greater levels of measurable visual fatigue or strain, contributing to a decrease in diagnostic
accuracy. The decrease in accuracy was not as great however as with static images.
The objective of our research is to understand the perception of multiple abnormalities in an imaging examination
and to develop strategies for improved diagnostic. We are one of the few laboratories in the world pursuing the goal of
reducing detection errors through a better understanding of the underlying perceptual processes involved. Failure to
detect an abnormality is the most common class of error in diagnostic imaging and generally is considered the most
serious by the medical community. Many of these errors have been attributed to "satisfaction of search," which occurs
when a lesion is not reported because discovery of another abnormality has "satisfied" the goal of the search. We have
gained some understanding of the mechanisms of satisfaction of search (SOS) traditional radiographic modalities.
Currently, there are few interventions to remedy SOS error. For example, patient history that the prompts specific
abnormalities, protects the radiologist from missing them even when other abnormalities are present. The knowledge
gained from this programmatic research will lead to reduction of observer error.
Collecting clinical cases for medical imaging perception studies is often challenging. We have developed a suite of
software tools for manipulating medical tomographic image sets that overcome these difficulties. In our initial
development, abnormalities were removed or inserted on a slice-by-slice basis. To circumvent the problem with potential
artifacts in orthogonal views, we have redesigned the tools so that they operate in 3 dimensions. An operator controlled
ellipsoid mask region is used to select the removal and the replacement areas. This new approach has been validated on
PET data sets and has also been implemented for CT studies.
To measure the impact of reader of reader visual fatigue by assessing symptoms, the ability to keep the eye focused on
the display and diagnostic accuracy. Twenty radiology residents and 20 radiologists were given a diagnostic performance
test containing 60 skeletal radiographic studies, half with fractures, before and after a day of clinical reading. Diagnostic
accuracy was measured using area under the proper binormal curve (AUC). Error in visual accommodation was
measured before and after each test session and subjects completed the Swedish Occupational Fatigue Inventory (SOFI)
and the oculomotor strain subscale of the Simulator Sickness Questionnaire (SSQ) before each session. Average AUC
was 0.89 for before work test and 0.85 for the after work test, (F(1,36) = 4.15, p = 0.049 < 0.05). There was significantly
greater error in accommodation after the clinical workday (F(1,14829) = 7.81, p = 0.005 < 0.01), and after the reading
test (F(1,14829) = 839.33, p < 0.0001). SOFI measures of lack of energy, physical discomfort and sleepiness were higher
after a day of clinical reading (p < 0.05). The SSQ measure of oculomotor symptoms (i.e., difficulty focusing, blurred
vision) was significantly higher after a day of clinical reading (F(1,75) = 20.38, p < 0.0001). Radiologists are visually
fatigued by their clinical reading workday. This reduces their ability to focus on diagnostic images and to accurately
interpret them.
Our overall hypothesis is that current radiology practice produces oculomotor fatigue reducing diagnostic accuracy. The
goal of this study is to determine whether accommodative stability and diagnostic accuracy are reduced following digital
radiology interpretation. We are collecting data at two points in time - once in the morning prior to diagnostic reading
and once in the afternoon after reading. Subjects are completing surveys about their current physical status and number
of hours spent reading that day along and the type of images read. We are measuring accommodation using the WAM-
5500 Auto Refkeratometer. Subjects view bone images with subtle fractures and dislocations to determine if a fracture is
present, locate it, and provide rating of their decision confidence to be used in a ROC analysis of the data. Preliminary
results confirm our previous findings that we can measure visual fatigue. Radiologists are less able to focus on a distinct
point, especially at near distances, after a day of reading images on digital displays as opposed to before any reading
takes place. The SOFI and SSQ measures also indicate that radiologists are more fatigued at the end of a day's reading as
compared to before. The confidence ratings are being evaluated using ROC techniques. The results so far suggest a
reduction in diagnostic accuracy with tired eyes. Preliminary data from measuring visual accommodation and observer
performance support our hypothesis that radiologists suffer visual fatigue after a day reading diagnostic images from
digital displays reducing interpretation accuracy.
KEYWORDS: Medical imaging, Tomography, Lung, Radiology, Visualization, Chest, Computed tomography, Software development, Medical research, Current controlled current source
The ability to insert abnormalities in clinical tomographic images makes image perception studies with medical images
practical. We describe a new insertion technique and its experimental validation that uses complementary image masks
to select an abnormality from a library and place it at a desired location. The method was validated using a 4-alternative
forced-choice experiment. For each case, four quadrants were simultaneously displayed consisting of 5 consecutive
frames of a chest CT with a pulmonary nodule. One quadrant was unaltered, while the other 3 had the nodule from the
unaltered quadrant artificially inserted. 26 different sets were generated and repeated with order scrambling for a total of
52 cases. The cases were viewed by radiology staff and residents who ranked each quadrant by realistic appearance. On
average, the observers were able to correctly identify the unaltered quadrant in 42% of cases, and identify the unaltered
quadrant both times it appeared in 25% of cases. Consensus, defined by a majority of readers, correctly identified the
unaltered quadrant in only 29% of 52 cases. For repeats, the consensus observer successfully identified the unaltered
quadrant only once. We conclude that the insertion method can be used to reliably place abnormalities in perception
experiments.
We hypothesized that the current practice of radiology produces oculomotor fatigue that reduces diagnostic accuracy. The initial step in testing this hypothesis is to measure visual strain. We are approaching this by measuring visual accommodation of radiologists before and after diagnostic viewing work. We measure accommodation using the WAM-5500 Auto Refkeratometer from Grand Seiko, which collects refractive measurements and pupil diameter measurements.
The radiologists focus on a simple target while accommodation is measured. The target distances are varied from near to
far starting at 20 cm target distance from the eye to 183 cm. The data are compared for prior to and after long-term diagnostic viewing. Results indicate that we are successfully measuring visual accommodation. Accommodation at long distances does not seem to differ before and after diagnostic reading. Accommodation at near distances however does differ, with decreased ability to accommodate after many hours of diagnostic reading. Since near distances are crucial during diagnostic reading, this could have a substantial impact on diagnostic accuracy (the next phase of the project).
KEYWORDS: Medical imaging, Software development, Computed tomography, Inspection, Radiography, Control systems, Medical research, Java, Data transmission, Bone
We developed image presentation software that mimics the functionality available in the clinic, but also records time-stamped, observer-display interactions and is readily deployable on diverse workstations making it possible to collect comparable observer data at multiple sites. Commercial image presentation software for clinical use has limited application for research on image perception, ergonomics, computer-aids and informatics because it does not collect observer responses, or other information on observer-display interactions, in real time. It is also very difficult to collect observer data from multiple institutions unless the same commercial software is available at different sites. Our software not only records observer reports of abnormalities and their locations, but also inspection time until report, inspection time for each computed radiograph and for each slice of tomographic studies, window/level, and magnification settings used by the observer. The software is a modified version of the open source ImageJ software available from the National Institutes of Health. Our software involves changes to the base code and extensive new plugin code. Our free software is currently capable of displaying computed tomography and computed radiography images. The software is packaged as Java class files and can be used on Windows, Linux, or Mac systems. By deploying our software together with experiment-specific script files that administer experimental procedures and image file handling, multi-institutional studies can be conducted that increase reader and/or case sample sizes or add experimental conditions.
Image perception studies of medical images provide important information about how radiologists interpret images and insights for reducing reading errors. In the past, perception studies have been difficult to perform using clinical imaging studies because of the problems associated with obtaining images demonstrating proven abnormalities and appropriate normal control images. We developed and evaluated interactive software that allows the seamless removal of abnormal areas from CT lung image sets. We have also developed interactive software for capturing lung lesions in a database where they can be added to lung CT studies. The efficacy of the software to remove abnormal areas of lung CT studies was evaluated psychophysically by having radiologists select the one altered image from a display of four. The software for adding lesions was evaluated by having radiologists classify displayed CT slices with lesions as real or artificial scaled to 3 levels of confidence. The results of these experiments demonstrated that the radiologist had difficulty in distinguishing the raw clinical images from those that had been altered. We conclude that this software can be used to create experimental normal control and "proven" lesion data sets for volumetric CT of the lung fields. We also note that this software can be easily adapted to work with other tissue besides lung and that it can be adapted to other digital imaging modalities.
Although ROC analysis is the accepted methodology for evaluation of diagnostic imaging systems, it has some serious shortcomings. By contrast, FROC methodology allows the observer to report multiple abnormalities per case, and uses the location of reported abnormalities to improve the measurement. Because ROC methodology has no way to allow multiple responses or use the location information, its statistical power will suffer. The FROC method has not enjoyed widespread acceptance because of concern about whether responses made to the same case can be treated as independent. We propose a new jackknife FROC method (JAFROC) that does not make the independence assumption. The new method combines elements of FROC and the Dorfman-Berbaum-Metz (DBM) multi-reader ROC methods. To compare the JAFROC method to an earlier free-response method (alternative free-response or AFROC method), and to the DBM method, which uses conventional ROC scoring, we developed a model for generating simulated FROC detection and location data. The simulation model is quite general and can be used to evaluate any method for analysis of multiple-response detection-and-localization data. It allowed us to examine null hypothesis (NH) behavior and statistical power of analytic methods. We found that AFROC analysis did not pass the NH test, being unduly conservative. Both the JAFROC method and the DBM passed the NH test, but JAFROC had more statistical power than the DBM method. The results of this comparison suggests that future studies of diagnostic performance may enjoy improved statistical power or reduced sample size requirements through the use of the JAFROC method.
Since an image data compression technique is usually associated with a low-pass filter, the unsharpness of calcifications and edges are of clinical concerns in mammography. The same effect may turn film defects into calcification-like spots and could produce false-positive detection by the radiologist. In this study, we employed a highly sensitive calcification detection system to guide an S+P integer wavelet compression, so that the data fidelity of calcifications or unknown spots are fully preserved. The prediction component of the S+P decomposition is based on Daubechies'D8. Our results indicated that the modified CAD program detected an average of 1,193 potential calcifications on CC view mammograms and an average of 948 potential calcifications on MLO view mammograms, respectively. Compressed data rates between 0.1 to 0.43 bit/pixel were studied. The compressed images were evaluated by subjective comparison studies. The results indicated that no difference could be observed between the original and the 0.43 bit rate decompressed images. The radiologist identifies 20% of the compressed images at 0.1 bit rate suffering from minor blurry artifacts and 6% of the compressed images possessing greater edge sharpness. Without a lossless compression for microcalcifications, the radiologist identified 20% of the microcalcifications on the compressed mammograms at 0.1 bit rate suffering from minor compression artifacts.
A contaminated binormal receiver operating characteristic (ROC) model is proposed to account for ROC data with very few false positive reports even though many normal patients are sampled. The model assumes that for a proportion of abnormalities, no signal information is captured and that those abnormalities have the same distribution as noise along the latent decision axis. The new model can fit ROC data in which some or all of the ROC points have false positive fractions of zero and true positive fractions less than one without concluding perfect performance. The resulting ROC curves never exhibit inappropriate chance line crossings. The model holds that, for expert decision makers, there are situations in which the prevalence and utility matrix preclude operating points in some ROC regions. The model has a straightforward extension to the joint detection and localization ROC curve. Fits of the contaminated binormal ROC model to non-degenerate data from exemplary experiments in radiology were evaluated. For several studies, the contaminated binormal model fit the data better than conventional ROC models suggesting that contamination may not be limited to degenerate ROC data. This research has been published for a different audience.
Receiver operating characteristic (ROC) data with false positive fractions of zero are often difficult to fit with standard ROC methodology, and are sometimes discarded. Some extreme examples of such data were analyzed. A new ROC model is proposed that assumes that for a proportion of abnormalities, no signal information is captured and that those abnormalities have the same distribution as noise along the latent decision axis. Rating reports of fracture for single view ankle radiographs were also analyzed with the binormal ROC model and two proper ROC models. The conventional models gave ROC area close to one, implying a true positive fraction close to one. The data contained no such fractions. When all false positive fractions were zero, conventional ROC areas gave little or no hint of unmistakable differences in true positive fractions. In contrast, the new model can fit ROC data in which some or all of the ROC points have false positive fractions of zero and true positive fractions less than one without concluding perfect performance. These data challenge the validity and robustness of conventional ROC models, but the contaminated binormal model accounts for these data. This research has been published for a different audience.
The major purpose of this paper was to evaluate the Dorfman/Berbaum/Metz (DBM) method for analyzing multireader receiver operating characteristic (ROC) discrete rating data on reader split-plot and case split-plot designs. It is not always appropriate or practical for readers to interpret imaging studies of the same patients in all modalities. In split plot designs, either a different sample of readers is assigned to each modality or a different sample of cases is assigned to each modality. For each type of split-plot design, a series of null-case Monte Carlo simulations were conducted. The results suggest that the DBM method provides trustworthy alpha levels with discrete ratings when ROC area is not too large, and case and reader sample sizes are not too small. In other situations, the test tends to be somewhat conservative. Our Monte Carlo simulations show that the DBM multireader method can be validly extended to the reader-split and case- split plot designs.
Our strategy in studying PACS is to evaluate its clinical implementation working
with equipment supplied by an established manufacturer. Fiscal and personnel
resources required to design and integrate the hardware components and operational
software to develop a functional PACS precluded a bottom up development approach at
our institution. Imaging equipment vendors possess more abundant design development
resources for this task and therefore can support a more rapid development of the
initial components of PACS.
For this reason we have chosen to serve as a beta test site to study the viability
of the basic PACS components in a clinical setting. Our efforts primarily focus
on: (1) image quality; (2) cost effectiveness; (3) PACS/HIS/RIS integration;
(4) equipment and software reliability; and (5) overall system performance. The
results of our studies are shared with the vendor for future PACS development and
refi nement.
To attain our investigational goals we have formed an interdisciplinary team of
Radiologists, Perceptual Psychologist, Economist, Electrical and Industrial
Engineers, Hospital Information System personnel and key departmental
administrative staff.
For several reasons Pediatric Radiology was targeted as the initial area for our
PACS study: a small area representative of the overall operation,tight operational
controls and willingness of physicians. We used a step-wise approach, the first step
being the installation of PACS exclusively within the physical confines of Pediatric
Radiology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.