VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi, Dhatri Pandya
- Year
- 2019
- Citations
- 2
Abstract
Recently Computer vision and Natural language processing paradigm contains enormous research progress in their respective areas. Despite the progress in both areas, still it remains as a challenging task for machines to extract image semantics and then communicate this extracted information with the desired users. These problems will be solved by Visual Question Answering (VQA) system by connecting both computer vision and natural language processing paradigms. In VQA, system is presented with an image and textual question related to that image. The system will generate the answer by processing on both image and textual features. Answer generated by VQA is in one word, phrase or in sentence. Various datasets are available for training and evaluating VQA system which contains real or abstract images and question-answer pairs related to the semantics available in the image. VQA is being used in many areas such as for blind and visually impaired users, robotics, art gallery and many more areas. This paper discusses VQA techniques, VQA datasets and highlights the parametric evaluation of these techniques along with generic issues in VQA system.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002