Learning multi-modal grounded linguistic semantics by playing I Spy
Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, Raymond J. Mooney
- Year
- 2016
- Citations
- 70
Abstract
Grounded language learning bridges words like 'red' and 'square' with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build perceptual models that use haptic, auditory, and proprioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing objects using supervision from an interactive humanrobot I Spy game. In this game, the human and robot take turns describing one object among several, then trying to guess which object the other has described. All supervision labels were gathered from human participants physically present to play this game with a robot. We demonstrate that our multi-modal system for grounding natural language outperforms a traditional, vision-only grounding framework by comparing the two on the I Spy task. We also provide a qualitative analysis of the groundings learned in the game, visualizing what words are understood better with multimodal sensory information as well as identifying learned word meanings that correlate with physical object properties (e.g. 'small' negatively correlates with object weight).
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002