Deep Learning for Image-to-Text Generation: A Technical Overview

Xiaodong He, Li Deng

发表年份: 2017
引用次数: 118

摘要

Generating a natural language description from an image is an emerging interdisciplinary problem at the intersection of computer vision, natural language processing, and artificial intelligence (AI). This task, often referred to as image or visual captioning, forms the technical foundation of many important applications, such as semantic visual search, visual intelligence in chatting robots, photo and video sharing in social media, and aid for visually impaired people to perceive surrounding visual content. Thanks to the recent advances in deep learning, the AI research community has witnessed tremendous progress in visual captioning in recent years. In this article, we will first summarize this exciting emerging visual captioning area. We will then analyze the key development and the major progress the community has made, their impact in both research and industry deployment, and what lies ahead in future breakthroughs.

关键词

Closed captioningComputer scienceDeep learningNatural languageArtificial intelligenceTask (project management)Intersection (aeronautics)Software deploymentNatural (archaeology)Multimedia

Deep Learning for Image-to-Text Generation: A Technical Overview

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory