The Long-Short Story of Movie Description
Anna Rohrbach, Marcus Rohrbach, Bernt Schiele
- 发表年份
- 2015
- 访问权限
- 开放获取
摘要
Generating descriptions for videos has many applications including assisting blind people and human-robot interaction. The recent advances in image captioning as well as the release of large-scale movie description datasets such as MPII Movie Description allow to study this task in more depth. Many of the proposed methods for image captioning rely on pre-trained object classifier CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating descriptions. While image description focuses on objects, we argue that it is important to distinguish verbs, objects, and places in the challenging setting of movie description. In this work we show how to learn robust visual classifiers from the weak annotations of the sentence descriptions. Based on these visual classifiers we learn how to generate a description using an LSTM. We explore different design choices to build and train the LSTM and achieve the best performance to date on the challenging MPII-MD dataset. We compare and analyze our approach and prior work along various dimensions to better understand the key challenges of the movie description task.
关键词
相关论文
工业5.0中人机协作的多模态感知、互认知与具身执行综述与展望
Kai Ding, Qingyuan Mao, Yaqian Zhang 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
代理式人机协作:通过记忆实现上下文对齐
Jiahui Si, Wenchao Li, Xi Chen 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
迈向以人为中心的制造:人机协作装配中不确定性下的任务规划
Yingchao You, Ze Ji, Changyun Wei
Robotics and Computer-Integrated Manufacturing · 2026
自适应物理信息Transformer结合高斯过程残差补偿用于人机协作中的逆动力学建模
Rui Qian, Xi Zhang, Dongpeng Li 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026