Home /Research /Full-GRU Natural Language Video Description for Service Robotics Applications
HRI

Full-GRU Natural Language Video Description for Service Robotics Applications

Silvia Cascianelli, Gabriele Costante, Thomas A. Ciarfuglia, Paolo Valigi, Mario Luca Fravolini

Year
2018
Citations
35

Abstract

Enabling effective human-robot interaction is crucial for any service robotics application. In this context, a fundamental aspect is the development of a user-friendly human-robot interface, such as a natural language interface. In this letter, we investigate the robot side of the interface, in particular the ability to generate natural language descriptions for the scene it observes. We achieve this capability via a deep recurrent neural network architecture completely based on the gated recurrent unit paradigm. The robot is able to generate complete sentences describing the scene, dealing with the hierarchical nature of the temporal information contained in image sequences. The proposed approach has fewer parameters than previous state-of-the-art architectures, thus it is faster to train and smaller in memory occupancy. These benefits do not affect the prediction performance. In fact, we show that our method outperforms or is comparable to previous approaches in terms of quantitative metrics and qualitative evaluation when tested on benchmark publicly available datasets and on a new dataset we introduce in this letter.

Keywords

Computer scienceArtificial intelligenceRoboticsRobotBenchmark (surveying)Interface (matter)Natural languageContext (archaeology)Service (business)Service robot

Related papers

Browse all HRI papers