Human epidermal growth factor receptor 2 (HER2) serves as a prognostic and predictive biomarker for breast cancer. Recently, there has been an increasing number of studies evaluating the feasibility of utilizing H&E WSIs for determining HER2 status through innovative data-driven deep learning methods, taking advantage of the ubiquitous availability of H&E WSIs. One of the main challenges with these data-driven methods is the need for large-scale datasets with high quality annotations, which can be expensive to curate. Therefore, in this study, we explored both the region-of-interest (ROI)-based supervised and the attention-based multiple-instance-learning (MIL) weakly supervised methods for predicting HER2 status on H&E WSIs to evaluate whether avoiding labor-intensive tumor annotation will compromise the final prediction performance. The ROI-based method involved an Inception-v3 along with an aggregation step to combine the patch-level predictions into a WSI-level prediction. On the other hand, the attention-based MIL methods explored ImageNet pretrained ResNet, H&E image pretrained ResNet, and H&E image pretrained vision transformer (ViT) as encoders for WSI-level HER2 prediction. Experiments are carried out on N = 355 WSIs available in public domain with HER2 status determined by IHC and ISH and annotations of breast invasive carcinoma. The dataset was split into training/validation/test set with 80/10/10 ratio. Our results demonstrate that the attention-based ViT MIL method is able to reach similar accuracy as the ROI-based method on the independent test set (AUC of 0.79 (95% CI: 0.63-0.95) versus 0.88 (95% CI: 0.63-0.9) respectively), and thus reduces the burden of labor-intensive annotations. Furthermore, the attention mechanism enhances interpretability of the results and offers insights into the reliability of the predictions.
|