Cross-lingual identification of ambiguous discourse connectives for resource-poor language

The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chines...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHOU, Lanjun, GAO, Wei, LI, Binyang, WEI, Zhongyu, WONG, Kam-Fai
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4588
https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5591
record_format dspace
spelling sg-smu-ink.sis_research-55912019-12-26T07:52:44Z Cross-lingual identification of ambiguous discourse connectives for resource-poor language ZHOU, Lanjun GAO, Wei LI, Binyang WEI, Zhongyu WONG, Kam-Fai The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available. 2012-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4588 https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Discourse Explicit Connectives Ambiguity of Connectives Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Discourse
Explicit Connectives
Ambiguity of Connectives
Databases and Information Systems
spellingShingle Discourse
Explicit Connectives
Ambiguity of Connectives
Databases and Information Systems
ZHOU, Lanjun
GAO, Wei
LI, Binyang
WEI, Zhongyu
WONG, Kam-Fai
Cross-lingual identification of ambiguous discourse connectives for resource-poor language
description The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available.
format text
author ZHOU, Lanjun
GAO, Wei
LI, Binyang
WEI, Zhongyu
WONG, Kam-Fai
author_facet ZHOU, Lanjun
GAO, Wei
LI, Binyang
WEI, Zhongyu
WONG, Kam-Fai
author_sort ZHOU, Lanjun
title Cross-lingual identification of ambiguous discourse connectives for resource-poor language
title_short Cross-lingual identification of ambiguous discourse connectives for resource-poor language
title_full Cross-lingual identification of ambiguous discourse connectives for resource-poor language
title_fullStr Cross-lingual identification of ambiguous discourse connectives for resource-poor language
title_full_unstemmed Cross-lingual identification of ambiguous discourse connectives for resource-poor language
title_sort cross-lingual identification of ambiguous discourse connectives for resource-poor language
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/4588
https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf
_version_ 1770574923006214144