Memory-based learning for article generation

Article choice can pose difficult problems in applications such as machine translation and automated summarization. In this paper, we investigate the use of corpus data to collect statistical generalizations about article use in English in order to be able to generate articles automatically to suppl...

Full description

Saved in:
Bibliographic Details
Main Authors: Minnen, Guido, Bond, Francis, Copestake, Ann
Other Authors: School of Humanities and Social Sciences
Format: Conference or Workshop Item
Language:English
Published: 2011
Subjects:
Online Access:https://hdl.handle.net/10356/83964
http://hdl.handle.net/10220/7246
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Article choice can pose difficult problems in applications such as machine translation and automated summarization. In this paper, we investigate the use of corpus data to collect statistical generalizations about article use in English in order to be able to generate articles automatically to supplement a symbolic generator. We use data from the Penn Treebank as input to a memory-based learner (TiMBL 3.0; Daelemans et al., 2000) which predicts whether to generate an article with respect to an English base noun phrase. We discuss competitive results obtained using a variety of lexical, syntactic and semantic features that play an important role in automated article generation.