Paper
24 June 2020 Improving visual question answering with pre-trained language modeling
Yue Wu, Huiyi Gao, Lei Chen
Author Affiliations +
Proceedings Volume 11526, Fifth International Workshop on Pattern Recognition; 115260D (2020) https://doi.org/10.1117/12.2574575
Event: Fifth International Workshop on Pattern Recognition, 2020, Chengdu, China
Abstract
Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.
© (2020) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yue Wu, Huiyi Gao, and Lei Chen "Improving visual question answering with pre-trained language modeling", Proc. SPIE 11526, Fifth International Workshop on Pattern Recognition, 115260D (24 June 2020); https://doi.org/10.1117/12.2574575
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Transformers

Visualization

Image fusion

Data modeling

Feature extraction

Visual process modeling

Back to Top