Automatic bilingual lexicon extraction for a minority target language

An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Tiua, Eileen Pamela, Roxas, Rachel Edita O.
格式:	text
出版:	Animo Repository 2008
主題:	Lexicography—Data processing Computational linguistics Computer Sciences
在線閱讀:	https://animorepository.dlsu.edu.ph/faculty_research/4040
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	oai:animorepository.dlsu.edu.ph:faculty_research-4945
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:faculty_research-49452021-08-13T00:29:28Z Automatic bilingual lexicon extraction for a minority target language Tiua, Eileen Pamela Roxas, Rachel Edita O. An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of speech tags for defining the different senses of a word. The domain-specific corpora for the source language contain 381,553 English words, while the target language with minimal language resources contain 92,610 Tagalog word, with 4,817 and 3,421 distinct root words, respectively. Despite the use of limited amount of corpora (400k vs Sadat's (2003) 39M word corpora) and seed lexicon (9,026 entries vs Rapp's (1999) 16,380 entries), the evaluation yielded promising results. The 50 high and 50 low frequency words yielded 50.29% and 31.37% recall values, and 56.12% and 21.98% precision values, respectively, which are within the range of values from previous studies, 39 - 84.45% (Koehn et al., 2002 and Zhou et al., 2001). Ranking showed an improvement to overall F-measure from 7.32% to 10.65%. © 2007 by Eileen Pamela Tiu, and Rachel Edita O.Roxas. 2008-12-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/4040 Faculty Research Work Animo Repository Lexicography—Data processing Computational linguistics Computer Sciences
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
topic	Lexicography—Data processing Computational linguistics Computer Sciences
spellingShingle	Lexicography—Data processing Computational linguistics Computer Sciences Tiua, Eileen Pamela Roxas, Rachel Edita O. Automatic bilingual lexicon extraction for a minority target language
description	An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of speech tags for defining the different senses of a word. The domain-specific corpora for the source language contain 381,553 English words, while the target language with minimal language resources contain 92,610 Tagalog word, with 4,817 and 3,421 distinct root words, respectively. Despite the use of limited amount of corpora (400k vs Sadat's (2003) 39M word corpora) and seed lexicon (9,026 entries vs Rapp's (1999) 16,380 entries), the evaluation yielded promising results. The 50 high and 50 low frequency words yielded 50.29% and 31.37% recall values, and 56.12% and 21.98% precision values, respectively, which are within the range of values from previous studies, 39 - 84.45% (Koehn et al., 2002 and Zhou et al., 2001). Ranking showed an improvement to overall F-measure from 7.32% to 10.65%. © 2007 by Eileen Pamela Tiu, and Rachel Edita O.Roxas.
format	text
author	Tiua, Eileen Pamela Roxas, Rachel Edita O.
author_facet	Tiua, Eileen Pamela Roxas, Rachel Edita O.
author_sort	Tiua, Eileen Pamela
title	Automatic bilingual lexicon extraction for a minority target language
title_short	Automatic bilingual lexicon extraction for a minority target language
title_full	Automatic bilingual lexicon extraction for a minority target language
title_fullStr	Automatic bilingual lexicon extraction for a minority target language
title_full_unstemmed	Automatic bilingual lexicon extraction for a minority target language
title_sort	automatic bilingual lexicon extraction for a minority target language
publisher	Animo Repository
publishDate	2008
url	https://animorepository.dlsu.edu.ph/faculty_research/4040
_version_	1767196014703280128

Automatic bilingual lexicon extraction for a minority target language

相似書籍