首页 /研究 /Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration
PERCEPTION

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

Changsong Liu, Shaohua Yang, Sari Saba-Sadiya, Nishant Shukla, Yunzhong He, Song‐Chun Zhu, Joyce Chai

发表年份
2016
引用次数
42
访问权限
开放获取

摘要

To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation. The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.

关键词

Computer scienceTask (project management)Grounded theoryHuman–computer interactionNatural language processingVisual languageArtificial intelligenceLinguisticsQualitative researchEngineering

相关论文

查看 PERCEPTION 分类全部论文