Presentation + Paper
13 November 2024 Visual prompt tuning and ensemble undersampling for one-shot vehicle classification
Jan Erik van Woerden, Gertjan J. Burghouts, Sabina B. van Rooij, Frank Ruis, Judith Dijk, Hugo J. Kuijf
Author Affiliations +
Abstract
Vision-language foundation models for image classification, such as CLIP, suffer from a poor performance when applied to images of objects dissimilar to the training data. A relevant example of such a mismatch can be observed when classifying military vehicles. In this work, we investigate techniques to extend the capabilities of CLIP for this application. Our contribution is twofold: (a) we study various techniques to extend CLIP with knowledge on military vehicles and (b) we propose a two-stage approach to classify novel vehicles based on only one example image.

Our dataset consists of 13 military vehicle classes, with 50 images per class. Various techniques to extend CLIP with knowledge on military vehicles were studied, including: context optimization (CoOp), vision-language prompting (VLP), and visual prompt tuning (VPT); of which VPT was selected. Next, we studied one-shot learning approaches to have the extended CLIP classify novel vehicle classes based on only one image. The resulting two-stage ensemble approach was used in a number of leave-one-group-out experiments to demonstrate performance.

Results show that, by default, CLIP has a zero-shot classification performance of 48% for military vehicles. This can be improved to >80% by fine-tuning with example data, at the cost of losing the ability to classify novel (previously unseen) military vehicle types. A naive one-shot approach results in a classification performance of 19%, whereas our proposed one-shot approach achieves 70% for novel military vehicle classes.

In conclusion, our proposed two-stage approach can extend CLIP for military vehicle classification. In the first stage, CLIP is provided with knowledge on military vehicles using domain adaptation with VPT. In the second stage, this knowledge can be leveraged for previously unseen military vehicle classes in a one-shot setting.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jan Erik van Woerden, Gertjan J. Burghouts, Sabina B. van Rooij, Frank Ruis, Judith Dijk, and Hugo J. Kuijf "Visual prompt tuning and ensemble undersampling for one-shot vehicle classification", Proc. SPIE 13206, Artificial Intelligence for Security and Defence Applications II, 132060G (13 November 2024); https://doi.org/10.1117/12.3029517
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image classification

Visualization

Visual process modeling

Defense and security

Machine learning

Data modeling

Performance modeling

Back to Top