Examining crosslingual word sense disambiguation.
Understanding human language computationally remains a challenge at different levels, phonologically, syntactically and semantically. This thesis attempts to understand human language's ambiguity through the Word Sense Disambiguation (WSD) task. Word Sense Disambiguation (WSD) is the task...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/54652 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Understanding human language computationally remains a challenge at different levels, phonologically,
syntactically and semantically. This thesis attempts to understand human language's ambiguity through
the Word Sense Disambiguation (WSD) task. Word Sense Disambiguation (WSD) is the task of
determining the correct sense of a word given a context sentence and topic models are statistical models
of human language that can discover abstract topics given a collection of documents.
This thesis examines the WSD task in a crosslingual manner with the usage of topic models and parallel
corpus. The thesis defines a topical crosslingual WSD (Topical CLWSD) task as two subtasks (i) Match
and Translate: finding a match of the query sentence in a parallel corpus using topic models that
provides the appropriate translation of the target polysemous word (ii) Map: mapping the word-translation
pair to disambiguate the concept respectively of the Open Multilingual WordNet. The XLING WSD
system has been built to attempt the topical WSD task. Although the XLING system underperforms in the
topical WSD task, it serves as a pilot approach to crosslingual WSD in a knowledge-lean manner.
Other than the WSD task, the thesis briefly presents updates on the ongoing work to compile multilingual
data for the Nanyang Technological University-Multilingual Corpus (NTU-MC). Both the NTU-MC
project and the XLING system are related in their attempts to build crosslingual language technologies. |
---|