Home /Research /Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

PERCEPTION

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

Changsong Liu, Shaohua Yang, Sari Saba-Sadiya, Nishant Shukla, Yunzhong He, Song‐Chun Zhu, Joyce Chai

Year: 2016
Citations: 42
Access: Open access

Abstract

To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation. The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.

Keywords

Computer scienceTask (project management)Grounded theoryHuman–computer interactionNatural language processingVisual languageArtificial intelligenceLinguisticsQualitative researchEngineering

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory