Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images

Zolghadriha, Erfan; Fouladi-Ghaleh, Kazim; Ardehkhani, Pouya

doi:10.22060/eej.2023.22605.5552

Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images

Document Type : Research Article

Authors

¹ M.Sc. Graduate, Deep Learning Research Lab, Department of Computer Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Iran

² Faculty Member, Department of Computer Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Iran

³ B.Sc. Student, Deep Learning Research Lab, Department of Computer Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Iran

10.22060/eej.2023.22605.5552

Abstract

In specialized fields, the accurate answering of visual questions is crucial for practical applications, and this study focuses on improving a visual question answering model for artistic images by utilizing a dataset with both visual and knowledge-based questions. The approach involves employing a pre-trained BERT model to understand query nature and using the iQAN model with MLB and MUTAN mechanisms for visual queries, along with an XLNet-based model for knowledge-based information. The results demonstrate a 78.92% accuracy for visual questions, 47.71% for knowledge-based questions, and an overall accuracy of 55.88% by combining both branches. Additionally, the research explores the impact of parameters like the number of glances and activation functions on the model's performance.

In specialized fields, the accurate answering of visual questions is crucial for practical applications, and this study focuses on improving a visual question answering model for artistic images by utilizing a dataset with both visual and knowledge-based questions. The approach involves employing a pre-trained BERT model to understand query nature and using the iQAN model with MLB and MUTAN mechanisms for visual queries, along with an XLNet-based model for knowledge-based information. The results demonstrate a 78.92% accuracy for visual questions, 47.71% for knowledge-based questions, and an overall accuracy of 55.88% by combining both branches. Additionally, the research explores the impact of parameters like the number of glances and activation functions on the model's performance.

Keywords

Main Subjects

Computer Vision and Pattern Recognition

Article View: 77

Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images

Volume 56, Issue 2
March 2024
Pages 4-4

Files

Share

How to cite

Statistics

Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images

Volume 56, Issue 2March 2024Pages 4-4

Files

Share

How to cite

Statistics

Volume 56, Issue 2
March 2024
Pages 4-4