Automatic bilingual lexicon extraction for a minority target language

An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of...

Full description

Saved in:
Bibliographic Details
Main Authors: Tiua, Eileen Pamela, Roxas, Rachel Edita O.
Format: text
Published: Animo Repository 2008
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/4040
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-4945
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-49452021-08-13T00:29:28Z Automatic bilingual lexicon extraction for a minority target language Tiua, Eileen Pamela Roxas, Rachel Edita O. An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of speech tags for defining the different senses of a word. The domain-specific corpora for the source language contain 381,553 English words, while the target language with minimal language resources contain 92,610 Tagalog word, with 4,817 and 3,421 distinct root words, respectively. Despite the use of limited amount of corpora (400k vs Sadat's (2003) 39M word corpora) and seed lexicon (9,026 entries vs Rapp's (1999) 16,380 entries), the evaluation yielded promising results. The 50 high and 50 low frequency words yielded 50.29% and 31.37% recall values, and 56.12% and 21.98% precision values, respectively, which are within the range of values from previous studies, 39 - 84.45% (Koehn et al., 2002 and Zhou et al., 2001). Ranking showed an improvement to overall F-measure from 7.32% to 10.65%. © 2007 by Eileen Pamela Tiu, and Rachel Edita O.Roxas. 2008-12-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/4040 Faculty Research Work Animo Repository Lexicography—Data processing Computational linguistics Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Lexicography—Data processing
Computational linguistics
Computer Sciences
spellingShingle Lexicography—Data processing
Computational linguistics
Computer Sciences
Tiua, Eileen Pamela
Roxas, Rachel Edita O.
Automatic bilingual lexicon extraction for a minority target language
description An automated approach of extracting bilingual lexicon from comparable, nonparallel corpora was developed for a target language with limited linguistic resources. We combined approaches from previous researches which only concentrated on context extraction, clustering techniques, or usage of part of speech tags for defining the different senses of a word. The domain-specific corpora for the source language contain 381,553 English words, while the target language with minimal language resources contain 92,610 Tagalog word, with 4,817 and 3,421 distinct root words, respectively. Despite the use of limited amount of corpora (400k vs Sadat's (2003) 39M word corpora) and seed lexicon (9,026 entries vs Rapp's (1999) 16,380 entries), the evaluation yielded promising results. The 50 high and 50 low frequency words yielded 50.29% and 31.37% recall values, and 56.12% and 21.98% precision values, respectively, which are within the range of values from previous studies, 39 - 84.45% (Koehn et al., 2002 and Zhou et al., 2001). Ranking showed an improvement to overall F-measure from 7.32% to 10.65%. © 2007 by Eileen Pamela Tiu, and Rachel Edita O.Roxas.
format text
author Tiua, Eileen Pamela
Roxas, Rachel Edita O.
author_facet Tiua, Eileen Pamela
Roxas, Rachel Edita O.
author_sort Tiua, Eileen Pamela
title Automatic bilingual lexicon extraction for a minority target language
title_short Automatic bilingual lexicon extraction for a minority target language
title_full Automatic bilingual lexicon extraction for a minority target language
title_fullStr Automatic bilingual lexicon extraction for a minority target language
title_full_unstemmed Automatic bilingual lexicon extraction for a minority target language
title_sort automatic bilingual lexicon extraction for a minority target language
publisher Animo Repository
publishDate 2008
url https://animorepository.dlsu.edu.ph/faculty_research/4040
_version_ 1767196014703280128