Poster + Presentation + Paper
4 April 2022 A comparison of feature selection methods for the development of a prognostic radiogenomic biomarker in non-small cell lung cancer patients
Author Affiliations +
Conference Poster
Abstract
This study aims at comparing methods for selecting optimal radiomic and gene expression features to develop a radiogenomic phenotype, that will be used to predict overall survival in non-small cell lung cancer (NSCLC) patients. Baseline CT images of 85 NSCLC patients (male/female: 58/27, event: death, adenocarcinoma/squamous cell carcinoma/unspecified: 41/32/12, in stages I/II/III/unspecified: 39/25/12/9) with gene expression profile (microarray data) of 33 genes were used from the NSCLC-Radiomics Genomics dataset, publicly available from the National Cancer Institute’s Cancer Imaging Archive (TCIA). The 33 genes were selected on the basis that they represent three major co-expression patterns (“signatures”) in the dataset. These included the histology, neuroendocrine (NE) and pulmonary surfactant systems (PSS) signature genes. ITKSNAP was used for 3D tumor volume segmentation from CT scans. Radiomic features (n=102) were extracted from the 3D tumor volume using the CaPTk software. The first approach performs the feature selection in two steps: intra-modal feature selection (select features within the radiomic and genomic modalities such that the features are not highly correlated with each other and do not have a skewed distribution, have a positive Mean Decrease in Accuracy (MDA) value and maximize the AUC in the prediction of overall survival) and inter-modal feature selection (select features that are not highly correlated with features from other modalities). The second approach builds upon the standard and widely used Principal Component Analysis but tries to improve on its poor performance for survival analysis by doing consensus clustering to determine the optimal number of feature clusters within the radiomic and genomic modalities. For each of the clusters, the first principal component is calculated and used as the representative feature for that highly correlative cluster. The third approach provides a supervised take on feature selection by fitting a Cox regression with lasso regularization on the radiomic and genomic features to obtain a correlation between the individual features and the overall survival outcome. The features which have the highest correlation with the outcome are selected. Consensus clustering with a 10% cutoff for minimum change in the cumulative distribution function is used to calculate the optimal multi-modal phenotypes from the optimal multi-modal features determined from these three approaches. The multi-modal phenotypes were combined with clinical factors of histology, stage and sex in five-fold cross-validated multivariate Cox proportional hazards models (200 iterations) of overall survival. In addition to the cross-validated cstatistics, we also built a model on the complete dataset, for each of the approaches, to evaluate the Kaplan Meier performance in separating participants above versus the median prognostic score. The first approach gives a survival prediction performance (0.61, [0.55,0.63]) that is comparable to the third approach (0.61, [0.56,0.65]). The second approach results in a model that has a comparably lower prognostic performance (0.54, [0.48,0.60]). All three approaches result in models that improve on the prognostic performance of the model built using only clinical covariates (0.53, [0.50,0.59]). This preliminary study aims to draw comparisons between the various methods used to select optimal features from multi-modal descriptors of tumor regions.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Apurva Singh, Florian A. Hölzl, Sharyn Katz, and Despina Kontos "A comparison of feature selection methods for the development of a prognostic radiogenomic biomarker in non-small cell lung cancer patients", Proc. SPIE 12033, Medical Imaging 2022: Computer-Aided Diagnosis, 1203328 (4 April 2022); https://doi.org/10.1117/12.2611489
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Genomics

Feature selection

Performance modeling

Tumors

Cancer

Lung cancer

Data modeling

Back to Top