Natural language generation (NLG) research aims at constructing
systems that output speech or text to describe the content of internal
computer states and representations.
The task brings two kinds of obstacles.
I began by studing algorithmic aspects of NLG. This research
culminated in the SPUD system described in the
Specifications. NLG systems require large and complex
resources that describe how to use language. The research
challenge is to construct such resources quickly, correctly and
Algorithms. Once you develop suitable communicative
resources, the research challenge is to use them soundly and
efficiently. Formulating the needed procedures can be difficult
Because SPUD constructs sentences by reasoning from a declarative
grammar, SPUD brings distinctive opportunities for exploring
specifications in NLG. I explore these questions, which occupy me
presently, in a later subsection.
Algorithms for NLG
NLG is typically broken down into stages of discourse planning (to
select information and organize it into coherent paragraphs), sentence
planning (to choose words and structures to fit information into
sentence-sized units), and realization (to determine surface form of
output, including word order, morphology and final formatting or
intonation). The SPUD system combines the generation steps of
sentence planning and surface realization by using a lexicalized
grammar to construct the syntax and semantics of a sentence
The basic input to SPUD consists of a set of communicative goals to
achieve. SPUD seeks to satisfy three kinds of constraint in mapping
these goals onto a sentence that accomplishes them. Any fact to be
communicated must be contributed directly or by inference from an
element in an abstract grammatical structure, including lexical items.
Any reference to a domain entity must be elaborated into a description
that distinguishes the entities from its distractors (the salient
alternatives to it in context). And all this conceptual material must
be organized into a grammatical surface form.
SPUD builds its sentence step-by-step. Each step adds to the ongoing
sentence a lexicalized entry encoded as an elementary tree in
Lexicalized Tree-Adjoining Grammar (LTAG). By using these LTAG
structures, SPUD always works with concrete grammatical surface forms.
Each tree is paired with logical formulae that, by referring to a rich
discourse model, characterize the semantic and pragmatic contribution
that the tree makes to the sentence. By keeping track of these
contributions as it builds the sentence, SPUD is able to assess the
shared information that the sentence uses to identify discourse
entities and the new information that the sentence adds about them.
The advantages of this scheme include these:
These advantages make the SPUD algorithm a natural one to use to
formulate natural and concise output for dialogue interfaces to
- Syntactic constraints are handled early and naturally,
ensuring the expressibility of selected concepts.
- The order of adding content is flexible, allowing varied
sentences without complicated search.
- Grammatical knowledge is stated once only,
inside the generator.
- It enables efficient descriptions that refer implicitly to
parameters from the context.
- It enables efficient descriptions that exploit the hearer's
ability to recognize inferential links to material elsewhere
in the sentence.
Specifications for NLG
In NLG, the ideal way to specify linguistic resources is by example.
In most NLG applications, you can readily find or construct
gold-standard texts that you would like your system to generate.
These texts themselves should determine NLG resources with minimal
further effort. The research problem is to articulate methodologies
for preparing texts, and procedures for automatic text analysis, that
suffice to bootstrap a generator from data.
Now, clearly we need to annotate the syntactic structure behind the
text - a tree or some similar formal description. We also need to
know something about the speaker's intentions in producing the text -
for example, the domain elements that the speaker intended to refer
to. But what else?
We explored this problem manually, by building a SPUD grammar for a
small set of sentences. We were surprised to discover a methodology
that allows specifications to be developed systematically, by applying
simple heuristics connected with clear corpus evidence. We have since
been working to automate this methodology. For example, a slightly
simpler syntactic formalism allows us to go directly from a treebank
to syntactic grammar entries. Meanwhile we hypothesize that we will
then be able to go automatically from an annotated utterance to a
schema for the context the utterance presupposes and the communicative
goals that the utterance addresses. These two steps amount to an
algorithm that compiles a potential output sentence into the abstract
input and the grammatical resources that would be required for SPUD to
create the sentence.
Stone and Doran 97
Matthew Stone and Christine Doran.
Sentence Planning as Description Using Tree-Adjoining Grammar.
Proceedings of ACL 1997, pages 198--205.
This paper outlines SPUD from first principles, with an emphasis
on SPUD's linguistic
representations and SPUD's choice of appropriate syntactic and
in discourse context.
Stone and Doran 96
Matthew Stone and Christine Doran.
Paying Heed to Collocations.
Proceedings of INLG 1996, pages 91--100.
This is a survey of computational semantics in SPUD. By combining
lexical semantics that refers to a wide variety of entities, a
discourse model that assigns those entities differing levels of
salience, and a generation process that models reference to those
entities, SPUD can generate many kinds of collocations
incrementally and compositionally.
Stone and Webber 98
Matthew Stone and Bonnie Webber.
Textual Economy through Close Coupling of Syntax and Semantics.
Proceedings of INLG 1998, pages 178--187.
This paper investigates efficient descriptions of objects that exploit
inferential links to information elsewhere in a sentence. Such
descriptions are possible only if, like SPUD, a generator can
consider syntax and semantics simultaneously, and can assess
quickly and reliably how the hearer will interpret the current
(possibly incomplete) sentence.
Matthew Stone. On Identifying Sets. INLG 2000, pages 116-123.
This paper aims at developing a computational account of the
interpretation of plural noun phrases which would allow SPUD to
generate descriptions of sets of entities. The challenge is to avoid
listing all sets of salient entities in representing the interpretation,
and to avoid searching over sets of salient entities in constructing
interpretations using the grammar.
A preliminary version appeared in the
1999 ESSLLI workshop on the generation of nominal expressions.
Stone et al 00
Matthew Stone, Tonia Bleam, Christine Doran and Martha Palmer.
Lexicalized Grammar and the Description of Motion Events. Fifth
Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+)
This paper works out a methodology for designing
grammatical resources for verbs that allow SPUD to match uses of verbs
in a corpus. In particular, we show how SPUD allows the syntax, the
syntax-semantics interface, and the semantics of entries to be
formulated separately, relying in each case on specific distributional
Cassell et al 00
Justine Cassell, Matthew Stone and Hao Yan. Coordination and
context-dependence in the generation of embodied
This paper applies the SPUD system and the methodology for
constructing SPUD specifications to the problem of generating
communicative behaviors, including speech, intonation and gesture, for
an embodied conversational agent.
Stone et al 01
Matthew Stone, Christine Doran, Bonnie Webber, Tonia Bleam and
Martha Palmer. Microplanning with communicative intentions: the
This paper provides a systematic description of SPUD, including
motivations, algorithms, examples and methodology for specification.
Distributed on arxiv.org, and
TR 65, 2001. Comments welcome and appreciated.
Matthew Stone. Lexicalized Grammar 101. ACL Workshop on Tools and
Methodologies for Teaching Natural Language Processing, 2002, pages
This paper advocates a lightweight version of TAG
for teaching and exploratory research, which allows the
straightforward lexicalization of treebank parses and a natural
implementation of the SPUD algorithm.
Matthew Stone. Specifying Generation of Referring Expressions by
Example. AAAI Spring Symposium on Natural Language Generation in
Spoken and Written Dialogue, 2003, pages 133--140.
paper starts to look at how to construct knowledge bases
automatically from annotated data, which allow SPUD to generate
A new lightweight implementation of SPUD in Prolog, from June 2002.
Includes a parser/interpreter for the same formalism and model of
interpretation, and code to infer context and communicative goals from
The old, clunky
This version is provided to evaluate SPUD for
research purposes only (Feb 2, 1999).
SPUD is written in SML
of New Jersey, a free fast functional programming language.
SPUD data files
Many people find that the best way to understand SPUD concretely is
to take a concrete look at how you write a grammar for SPUD and
how you specify to SPUD the
knowledge base and communicative goals underlying particular sentences and
what output it produce in response.