首页 /研究 /Techniques for vision-based human-computer interaction
LEARNING

Techniques for vision-based human-computer interaction

Gregory D. Hager, Jason J. Corso

发表年份
2006
引用次数
4

摘要

With the ubiquity of powerful, mobile computers and rapid advances in sensing and robot technologies, there exists a great potential for creating advanced, intelligent computing environments. We investigate techniques for integrating passive, vision-based sensing into such environments, which include both conventional interfaces and large-scale environments. We propose a new methodology for vision-based human-computer interaction called the Visual Interaction Cues (VICs) paradigm. VICs fundamentally relies on a shared perceptual space between the user and computer using monocular and stereoscopic video. In this space, we represent each interface component as a localized region in the image(s). By providing a clearly defined interaction locale, it is not necessary to visually track the user. Rather we model interaction as an expected stream of visual cues corresponding to a gesture. Example interaction cues are motion as when the finger moves to press a push-button, and 3D hand posture for a communicative gesture like a letter in sign language. We explore both procedurally defined parsers of the low-level visual cues and learning-based techniques from machine learning (e.g. neural networks) for the cue parsing. Individual gestures are analogous to a language with only words and no grammar. We have constructed a high-level language model that integrates a set of low-level gestures into a single, coherent probabilistic framework. In the language model, every low-level gesture is called a gesture word. We build a probabilistic graphical model with each node being a gesture word, and use an unsupervised learning technique to train the gesture-language model. Then, a complete action is a sequence of these words through the graph and is called a gesture sentence. We are especially interested in building mobile interactive systems in large-scale, unknown environments. We study the associated where am I problem: the mobile system must be able to map the environment and localize itself in the environment using the video imagery. Under the VICs paradigm, we can solve the interaction problem using local geometry without requiring a complete metric map of the environment. (Abstract shortened by UMI.)

关键词

Computer scienceGestureArtificial intelligenceHuman–computer interactionGesture recognitionSet (abstract data type)ParsingComputer visionProgramming language

相关论文

查看 LEARNING 分类全部论文