Paper
28 April 2023 Comparison of large-scale pre-trained models based ViT, swin transformer and ConvNeXt
Jiapeng Yu
Author Affiliations +
Proceedings Volume 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022); 126104S (2023) https://doi.org/10.1117/12.2671201
Event: Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 2022, Wuhan, China
Abstract
In the field of computer vision, deep learning has developed tremendously, large-scale preforming has received increasing attention from experts and researchers. Different training models often have large performance gaps in training speed and accuracy when performing large-scale pre-training. In this case, choosing the appropriate model for large-scale pre-training is particularly important. This experiment uses the same image data set and the same hardware conditions to construct the image classification model respectively in the three mainstream image recognition large-scale pre-training models, Vision Transformer (VIT), Swin-Transformer and ConvNeXt, try to analyze the advantages and disadvantages of each model by experimental results. It is observed that Vision Transformer has the fastest running speed in computer vision classification experiments, but its accuracy is not as good as the other two models, Swin-Transformer has the slowest speed and average accuracy, ConvNeXt has the highest accuracy, but its speed is mediocre. The results of this experiment have some reference significance for future model selection for large-scale pre-training tasks in computer vision, this can decrease training time and improve training accuracy to some extent.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiapeng Yu "Comparison of large-scale pre-trained models based ViT, swin transformer and ConvNeXt", Proc. SPIE 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 126104S (28 April 2023); https://doi.org/10.1117/12.2671201
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Visual process modeling

Education and training

Data modeling

Data processing

Computer vision technology

Deep learning

Back to Top