Home /Research /Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration
PERCEPTION

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

Changsong Liu, Shaohua Yang, Sari Saba-Sadiya, Nishant Shukla, Yunzhong He, Song‐Chun Zhu, Joyce Chai

Year
2016
Citations
42
Access
Open access

Abstract

To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation. The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.

Keywords

Computer scienceTask (project management)Grounded theoryHuman–computer interactionNatural language processingVisual languageArtificial intelligenceLinguisticsQualitative researchEngineering

Related papers

Browse all PERCEPTION papers