Cross-lingual identification of ambiguous discourse connectives for resource-poor language
The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chines...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2012
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/4588 https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-5591 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-55912019-12-26T07:52:44Z Cross-lingual identification of ambiguous discourse connectives for resource-poor language ZHOU, Lanjun GAO, Wei LI, Binyang WEI, Zhongyu WONG, Kam-Fai The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available. 2012-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4588 https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Discourse Explicit Connectives Ambiguity of Connectives Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Discourse Explicit Connectives Ambiguity of Connectives Databases and Information Systems |
spellingShingle |
Discourse Explicit Connectives Ambiguity of Connectives Databases and Information Systems ZHOU, Lanjun GAO, Wei LI, Binyang WEI, Zhongyu WONG, Kam-Fai Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
description |
The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available. |
format |
text |
author |
ZHOU, Lanjun GAO, Wei LI, Binyang WEI, Zhongyu WONG, Kam-Fai |
author_facet |
ZHOU, Lanjun GAO, Wei LI, Binyang WEI, Zhongyu WONG, Kam-Fai |
author_sort |
ZHOU, Lanjun |
title |
Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
title_short |
Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
title_full |
Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
title_fullStr |
Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
title_full_unstemmed |
Cross-lingual identification of ambiguous discourse connectives for resource-poor language |
title_sort |
cross-lingual identification of ambiguous discourse connectives for resource-poor language |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2012 |
url |
https://ink.library.smu.edu.sg/sis_research/4588 https://ink.library.smu.edu.sg/context/sis_research/article/5591/viewcontent/C12_2138.pdf |
_version_ |
1770574923006214144 |