Conversational Agents and Dialogue




In 1994, I was part of an ambitious effort to build a prototype conversational agent led by Justine Cassell during a visting professorship at Penn. We wrote a program that could form hypothetical plans for multiple agents to act in a domain, and could enlist the help of another agent to confirm the hypotheses underlying the plan and carry out the actions described in the plan. The programs were realized as embodied agents whose communicative behavior included not just text but spoken intonation, hand gesture, head movement and facial displays. We ran two copies of this program as agents with different knowledge and ability, and showed that together these agents could achieve goals that neither had the knowledge or capability to accomplish independently. We called the result animated conversation.

Since then, much of my research has been motivated by the shortcomings in the implementation of that first prototype. One big problem was the computational cost of running the prototype. My generation research on SPUD helps address this by providing an NLG module with fewer search demands and with a better fit in the conversational agent (that eliminates expensive, redundant stages of processing). My logic and knowledge representation research helps by providing more efficient ways to reason about actions in the domain and the states of knowledge of participants in the conversation. Our animation work helps by providing reusable components for synchronized facial animation and speech synthesis whose design and implementation aims at a natural fit with other modules in conversational agents.

Cassell et al. 94a Justine Cassell, Catherine Pelachaud, Norm Badler, Mark Steedman, Brett Achorn, Tripp Becket, Brett Douville, Scott Prevost, and Matthew Stone. Animated Conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. SIGGRAPH 1994, pages 413-420.
This paper describes the animated conversation project from a graphics point of view.
Cassell et al. 94b Justine Cassell, Matthew Stone, Brett Douville, Scott Prevost, Brett Achorn, Mark Steedman, Norm Badler and Catherine Pelachaud. Modeling the interaction between speech and gesture. Cognitive Science Society 1994.
This paper describes the animated conversation project from a cognitive science point of view. (Link to submitted version.)
Cassell et al. 00 Justine Cassell, Matthew Stone and Hao Yan. Coordination and context-dependence in the generation of embodied conversation. First International Conference on Natural Language Generation, 2000, pages 171-178.

This paper summarizes the methodology Hao used to gather sample real-estate descriptions from people and formalize the communicative functions and behaviors people used in these descriptions; it then shows how we used this data and a psycholinguistic theory of communicative action to get REA to output similar descriptions using SPUD.

A preliminary version, including an extended discussion of the psycholinguistic and computational principles behind our approach, appeared in 1999 in the AAAI Fall Symposium on Psychological Models of Communication in Collaborative Systems.
DeCarlo et al. 04 Doug DeCarlo, Corey Revilla, Matthew Stone and Jennifer J. Venditti. Specifying and animating facial signals for discourse in embodied conversational agents.

In this paper, we describe a freely-available cross-platform real-time facial animation system, RUTH, that animates the high-level nonverbal signals, typical of face-to-face discourse, in synchrony with speech and lip movements. RUTH adopts an open, layered archi-tecture in which fine-grained features of the animation can be derived by rule from inferred linguistic structure.

Stone et al. 04 Speaking with hands: Creating Animated Conversational Characters from Recordings of Human Performance. Matthew Stone, Doug DeCarlo, Insuk Oh, Christian Rodriguez, Adrian Stere, Alyssa Lees, and Chris Bregler.

People's utterances in conversation are composed of short, clearly-delimited phrases; in each phrase, gesture and speech go together meaningfully and synchronize at a common point of maximum emphasis. This paper shows how to exploit this structure in methods to create animated conversational characters using databases of recorded speech and captured motion. By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, our techniques support the rapid construction of animated characters with rich and appropriate expression.

Cassell, Stone and Traum 05 Selected Published Research on Modeling Face-to-face Conversation. Justine Cassell, Matthew Stone and David Traum.

This annotated bibliography, prepared to accompany our course on formal pragmatics for face to face conversation at ESSLLI 2005 in Edinburgh, gives an overview of key research up to 2005 that can inform current computational models of face-to-face conversation.

Oh and Stone 07 Insuk Oh and Matthew Stone. Understanding RUTH: Creating Believable Behaviors for a Virtual Human under Uncertainty.

In this paper, we present an evaluation of RUTH and demonstrate the usefulness of RUTH in uncovering new insights into how people use their faces in face to face conversation. We focus on the case of uncertainty. Our results show that people can pick up on different levels of uncertainty both from videos of peopl and from corresponding simulations on RUTH. In addition, we used RUTH animations containing different subsets of facial signals to understand in more detail how nonverbal behavior conveys uncertainty.

Coauthors Norm Badler
Justine Cassell
Beth Ann Hockey
Stephen Isard
Scott Prevost
Beverly Spejweski
Mark Steedman
Hao Yan
Organizations ACL SIGDIAL

May 8, 2007