Home /Research /From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities
HRI

From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities

Hatice Güneş, Massimo Piccardi, Maja Pantić

Year
2008
Citations
89
Access
Open access

Abstract

The fact that humans perceive the world using rather complex multimodal systems does not necessarily imply that the machines should also posses all of the aforementioned functionalities. Humans need to operate in all possible situations and develop an adaptive behavior; machines instead can be highly profiled for a specific purpose, scenario, user, etc. For example, the computer inside an automatic teller machine probably does not need to recognize the affective states of a human. However, in other applications (e.g., computer agents, effective tutoring systems, clinical settings, monitoring user's stress level) where computers take on a social role such as an instructor or helper, recognizing users' affective states may enhance the computers' functionality A number of survey papers exist within the affect sensing and recognition literature (e.g., For instance, the shift from monomodal to multimodal affect recognition, together with systems using vision as one of the input modalities and analyzing affective face and body movement either as a pure monomodal system or as part of a multimodal affective framework, is discussed in An exhaustive survey of past efforts in audiovisual affect sensing and recognition, together with various visual, audio and audio-visual databases, is presented in However, no effort so far has attempted to compile and discuss visual (i.e., facial and bodily expression), audio, tactile (i.e., heart rate, skin conductivity, thermal signals etc.) and thought (i.e., brain and scalp signals) modalities together. Accordingly, this chapter sets out to explore recent advances in affect sensing and recognition by explicitly focusing on systems that are based on multiple input modalities and alternative channels, and is organized as follows. The first part is concerned with the challenges faced when moving from affect recognition systems that were designed in and for laboratory settings (i.e., analyzing posed data) to systems that are able to analyze spontaneous data in a multimodal framework. It discusses the problem domain of multimodal affect sensing, when moving from posed to spontaneous settings. The chapter initially focuses on background research, reviewing the theories of emotion, monomodal expression and perception of emotions, temporal information, posed vs. spontaneous expressions, and multimodal expression and perception of emotions. The chapter then explores further issues in data acquisition, data annotation, feature extraction, and multimodal affective state recognition. As affect recognition systems using multiple cues and modalities have only recently emerged, the next part of the chapter presents representative systems introduced during the period 2004 -2007, based on multiple visual cues (i.e., affective head, face and/or body movement), haptic cues (physiological sensing) or combination of modalities (i.e., visual and physiological channels, etc.) capable of handling data acquired either in the laboratory or real world settings. There exist some studies analyzing spontaneous facial expression data in the context of cognitive-science or medical applications (e.g., However, the focus of this chapter is on multimodal or multicue affective data, accordingly, systems analyzing spontaneous data are presented in the context of human-computer interaction (HCI) and human-robot interaction (HRI). The last part of this chapter discusses issues to be explored in order to advance the state-of-the-art in multimodal and multicue affect sensing and recognition.

Keywords

GestureFacial expressionModalitiesAffect (linguistics)Modality (human–computer interaction)PsychologySpeech recognitionCognitive psychologyCommunicationComputer science

Related papers

Browse all HRI papers