Radiomics can be used to generate a large magnitude of quantitative features from medical images that can be applied to various predictive tasks and treatment decisions. To ensure the generalizability of such methods, radiomic features need to be robust to variations in patient positioning and segmentation of regions of interest. Feature robustness is often determined through test-retest imaging, whereby the agreement of feature values from images acquired over a brief time interval is quantified to measure robustness. However, the scarcity of test-retest data is a significant limitation of such approaches, and there is a lack of consensus for alternative methods to determine feature robustness with single scans. Hence, this study evaluates the effectiveness of assessing feature robustness using various metrics to quantify the agreement of feature values before and after image perturbation. 1002 features were extracted from thoracic computed tomography scans of patients with pleural mesothelioma before and after perturbations, including rotation, erosion, dilation, and contour randomization, and five distinct metrics were used to assess feature agreement. Feature robustness was highly variable as quantified by various combinations of perturbations and metrics of agreement. The greatest stability in subsequent steps of the predictive pipeline including feature selection and classification was achieved using the concordance and intraclass correlation coefficients with chained perturbations of image rotation, contour erosion or dilation, and contour randomization. These findings suggest that the choice of image perturbations and metrics of agreement have non-negligible consequences on feature robustness estimates and the success of downstream predictive tasks, warranting careful consideration in experimental design.
|