Development, implementation and testing of language identification system for seven Philippine languages

Three Language Identification (LID)approaches, namely, acoustic, phonotactic, and prosodic approaches are explored for Philippine Languages. Gaussian Mixture Models (GMM) is used for acoustic and prosodic approaches. The acoustic features used were Mel Frequency Cepstral Coefficients (MFCC), Percept...

Full description

Saved in:
Bibliographic Details
Main Authors: Laguna, Ann Franchesca B., Guevara, Rowena Cristina L.
Format: text
Published: Animo Repository 2015
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/3346
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-4348
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-43482021-09-06T02:56:27Z Development, implementation and testing of language identification system for seven Philippine languages Laguna, Ann Franchesca B. Guevara, Rowena Cristina L. Three Language Identification (LID)approaches, namely, acoustic, phonotactic, and prosodic approaches are explored for Philippine Languages. Gaussian Mixture Models (GMM) is used for acoustic and prosodic approaches. The acoustic features used were Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Shifted Delta Cepstra (SDC) and Linear Prediction Cepstral Coefficients (LPCC). Pitch, rhythm, and energy are used as prosodic features. A Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM) are used for the phonotactic approach. After establishing that acoustic approach using a 32nd order PLP GMM-EM achieved the best performanceamong the combinations of approach and feature, three LID systems were built: 7-language LID, pair-wise LID and hierarchical LID; with average accuracy of 48.07%, 72.64% and 53.99%, respectively. Among the pair-wise LID systems the highest accuracy is 92.23% for Tagalog and Hiligaynon and the lowest accuracy is 52.21% for Bicolano and Tausug. In the hierarchical LID system, the accuracy for Tagalog, Cebuano, Bicolano, and Hiligaynon reached 80.56%, 80.26%, 78.26%, and 60.87% respectively. The LID systems that were designed, implemented and tested, are best suited for language verification or for language identification systems with small number of target languages that are closely related such as Philippine languages. © 2015, Science and Technology Information Institute. All rights reserved. 2015-06-01T07:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/3346 Faculty Research Work Animo Repository Computational linguistics Automatic speech recognition Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Computational linguistics
Automatic speech recognition
Computer Sciences
spellingShingle Computational linguistics
Automatic speech recognition
Computer Sciences
Laguna, Ann Franchesca B.
Guevara, Rowena Cristina L.
Development, implementation and testing of language identification system for seven Philippine languages
description Three Language Identification (LID)approaches, namely, acoustic, phonotactic, and prosodic approaches are explored for Philippine Languages. Gaussian Mixture Models (GMM) is used for acoustic and prosodic approaches. The acoustic features used were Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Shifted Delta Cepstra (SDC) and Linear Prediction Cepstral Coefficients (LPCC). Pitch, rhythm, and energy are used as prosodic features. A Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM) are used for the phonotactic approach. After establishing that acoustic approach using a 32nd order PLP GMM-EM achieved the best performanceamong the combinations of approach and feature, three LID systems were built: 7-language LID, pair-wise LID and hierarchical LID; with average accuracy of 48.07%, 72.64% and 53.99%, respectively. Among the pair-wise LID systems the highest accuracy is 92.23% for Tagalog and Hiligaynon and the lowest accuracy is 52.21% for Bicolano and Tausug. In the hierarchical LID system, the accuracy for Tagalog, Cebuano, Bicolano, and Hiligaynon reached 80.56%, 80.26%, 78.26%, and 60.87% respectively. The LID systems that were designed, implemented and tested, are best suited for language verification or for language identification systems with small number of target languages that are closely related such as Philippine languages. © 2015, Science and Technology Information Institute. All rights reserved.
format text
author Laguna, Ann Franchesca B.
Guevara, Rowena Cristina L.
author_facet Laguna, Ann Franchesca B.
Guevara, Rowena Cristina L.
author_sort Laguna, Ann Franchesca B.
title Development, implementation and testing of language identification system for seven Philippine languages
title_short Development, implementation and testing of language identification system for seven Philippine languages
title_full Development, implementation and testing of language identification system for seven Philippine languages
title_fullStr Development, implementation and testing of language identification system for seven Philippine languages
title_full_unstemmed Development, implementation and testing of language identification system for seven Philippine languages
title_sort development, implementation and testing of language identification system for seven philippine languages
publisher Animo Repository
publishDate 2015
url https://animorepository.dlsu.edu.ph/faculty_research/3346
_version_ 1767195886563098624