KEYWORDS: Transformers, Convolution, Data modeling, Image segmentation, Medical imaging, Education and training, Computed tomography, Prostate, 3D modeling, Surgery
Literature on medical imaging segmentation claims that hybrid UNet models containing both Transformer and convolutional blocks perform better than purely convolutional UNet models. This recently touted success of hybrid Transformers warrants an investigation into which of its components contribute to its performance. Also, previous work has a limitation of analysis only at fixed dataset scales as well as unfair comparisons with other models where parameter counts are not equivalent. Here, we investigate the performance of a hybrid Transformer network i.e. the nnFormer for organ segmentation in prostate CT scans. We do this in context of replacing its various components and by constructing learning curves by plotting model performance at different dataset scales. To compare with literature, the first experiment replaces all the shifted-window(swin) Transformer blocks of the nnFormer with convolutions. Results show that the convolution prevails as the data scale increases. In the second experiment, to reduce complexity, the self-attention mechanism within the swin-Transformer block is replaced with an similar albeit simpler spatial mixing operation i.e. max-pooling. We observe improved performance for max-pooling in smaller dataset scales, indicating that the window-based Transformer may not be the best choice in both small and larger dataset scales. Finally, since convolution has an inherent local inductive bias of positional information, we conduct a third experiment to imbibe such a property to the Transformer by exploring two kinds of positional encodings. The results show that there are insignificant improvements after adding positional encoding, indicating the hybrid swin-Transformers deficiency in capturing positional information given our dataset at its various scales. Through this work, we hope to motivate the community to use learning curves under fair experimental settings to evaluate the efficacy of newer architectures like Transformers for their medical imaging tasks. Code is available on https://github.com/prerakmody/ window-transformer-prostate-segmentation.
Deep learning models for organ contouring in radiotherapy are poised for clinical usage, but currently, there exist few tools for automated quality assessment (QA) of the predicted contours. Bayesian models and their associated uncertainty, can potentially automate the process of detecting inaccurate predictions. We investigate two Bayesian models for auto-contouring, DropOut and FlipOut, using a quantitative measure – expected calibration error (ECE) and a qualitative measure – region-based accuracy-vs-uncertainty (R-AvU) graphs. It is well understood that a model should have low ECE to be considered trustworthy. However, in a QA context, a model should also have high uncertainty in inaccurate regions and low uncertainty in accurate regions. Such behaviour could direct visual attention of expert users to potentially inaccurate regions, leading to a speed-up in the QA process. Using R-AvU graphs, we qualitatively compare the behaviour of different models in accurate and inaccurate regions. Experiments are conducted on the MICCAI2015 Head and Neck Segmentation Challenge and on the DeepMindTCIA CT dataset using three models: DropOut-DICE, Dropout-CE (Cross Entropy) and FlipOut-CE. Quantitative results show that DropOut-DICE has the highest ECE, while Dropout-CE and FlipOut-CE have the lowest ECE. To better understand the difference between DropOut-CE and FlipOut-CE, we use the R-AvU graph which shows that FlipOut-CE has better uncertainty coverage in inaccurate regions than DropOut-CE. Such a combination of quantitative and qualitative metrics explores a new approach that helps to select which model can be deployed as a QA tool in clinical settings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.