Incorporation of WordNet features to n-gram features in a language modeler

n-gram language modeling is a popular technique used to improve performance of various NLP applications. However, it still faces the "curse of dimensionality" issue wherein word sequences on which the model will be tested are likely to be different from those seen during training (Bengio e...

Full description

Saved in:
Bibliographic Details
Main Authors: Go, Kathleen L., See, Solomon L.
Format: text
Published: Animo Repository 2008
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/513
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-1512
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-15122021-11-24T01:09:59Z Incorporation of WordNet features to n-gram features in a language modeler Go, Kathleen L. See, Solomon L. n-gram language modeling is a popular technique used to improve performance of various NLP applications. However, it still faces the "curse of dimensionality" issue wherein word sequences on which the model will be tested are likely to be different from those seen during training (Bengio et al., 2003). An approach that incorporates WordNet to a trigram language modeler has been developed to address this issue. WordNet was used to generate proxy trigrams that may be used to reinforce the fluency of the given trigrams. Evaluation results reported a significant decrease in model perplexity showing that the new method, evaluated using the English language in the business news domain, is capable of addressing the issue. The modeler was also used as a tool to rank parallel translations produced by multiple Machine Translation systems. Results showed a 6-7% improvement over the base approach (Callison-Burch and Flournoy, 2001) in correctly ranking parallel translations. © 2008 by Kathleen L. Go and Solomon L. See. 2008-12-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/513 Faculty Research Work Animo Repository Natural language processing (Computer science) Computational linguistics Translating and interpreting Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Natural language processing (Computer science)
Computational linguistics
Translating and interpreting
Computer Sciences
spellingShingle Natural language processing (Computer science)
Computational linguistics
Translating and interpreting
Computer Sciences
Go, Kathleen L.
See, Solomon L.
Incorporation of WordNet features to n-gram features in a language modeler
description n-gram language modeling is a popular technique used to improve performance of various NLP applications. However, it still faces the "curse of dimensionality" issue wherein word sequences on which the model will be tested are likely to be different from those seen during training (Bengio et al., 2003). An approach that incorporates WordNet to a trigram language modeler has been developed to address this issue. WordNet was used to generate proxy trigrams that may be used to reinforce the fluency of the given trigrams. Evaluation results reported a significant decrease in model perplexity showing that the new method, evaluated using the English language in the business news domain, is capable of addressing the issue. The modeler was also used as a tool to rank parallel translations produced by multiple Machine Translation systems. Results showed a 6-7% improvement over the base approach (Callison-Burch and Flournoy, 2001) in correctly ranking parallel translations. © 2008 by Kathleen L. Go and Solomon L. See.
format text
author Go, Kathleen L.
See, Solomon L.
author_facet Go, Kathleen L.
See, Solomon L.
author_sort Go, Kathleen L.
title Incorporation of WordNet features to n-gram features in a language modeler
title_short Incorporation of WordNet features to n-gram features in a language modeler
title_full Incorporation of WordNet features to n-gram features in a language modeler
title_fullStr Incorporation of WordNet features to n-gram features in a language modeler
title_full_unstemmed Incorporation of WordNet features to n-gram features in a language modeler
title_sort incorporation of wordnet features to n-gram features in a language modeler
publisher Animo Repository
publishDate 2008
url https://animorepository.dlsu.edu.ph/faculty_research/513
_version_ 1718383398957875200