Video and synthetic MRI pre-training of 3D vision architectures for neuroimage analysis

Nikhil J. Dhinagar; Amit Singh; Saket Ozarkar; Ketaki Buwa; Sophia I. Thomopoulos; Conor Owens-Walton; Emily Laltoo; Yao-Liang Chen; Philip Cook; Corey McMillan; Chih-Chien Tsai; J-J Wang; Yih-Ru Wu; Paul M. Thompson

doi:10.1117/12.3008837

3 April 2024 Video and synthetic MRI pre-training of 3D vision architectures for neuroimage analysis

Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Paul M. Thompson

Author Affiliations +

Proceedings Volume 12927, Medical Imaging 2024: Computer-Aided Diagnosis; 129272B (2024) https://doi.org/10.1117/12.3008837
Event: SPIE Medical Imaging, 2024, San Diego, California, United States

Conference Poster

Abstract

Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on natural images, medical images, or even synthetically generated MRI scans or video data. To evaluate these alternatives, here we benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs), initialized with varied upstream pre-training approaches. These methods were then adapted to three unique downstream neuroimaging tasks with a range of difficulty: Alzheimer's disease (AD) and Parkinson’s disease (PD) classification, “brain age” prediction. Experimental tests led to the following key observations: 1. Pre-training improved performance across all tasks including a boost of 7.5% for AD classification and 4.5% for PD classification for the ViT and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs, 2. Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs, 3. CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances, 4. Pre-training improved generalization to out-of-distribution datasets and sites. Overall, we benchmarked different vision architectures, revealing the impact of pre-training them with emerging datasets for model initialization. The resulting pre-trained models can be adapted to a range of downstream neuroimaging tasks, especially when training data for a domain-specific target task is limited.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, and Paul M. Thompson "Video and synthetic MRI pre-training of 3D vision architectures for neuroimage analysis", Proc. SPIE 12927, Medical Imaging 2024: Computer-Aided Diagnosis, 129272B (3 April 2024); https://doi.org/10.1117/12.3008837

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available