Refining learning models in grammatical inference
Grammatical inference is a branch of computational learning theory that attacks the problem of learning grammatical models from string samples. In other words, grammatical inference tries to identify the computational models that generate the sample strings. In recent decade, due to the explosion of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2008
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/13589 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-13589 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-135892023-03-04T00:42:35Z Refining learning models in grammatical inference Wang, Xiangrui Narendra Shivaji Chaudhari School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Theory of computation::Mathematical logic and formal languages Grammatical inference is a branch of computational learning theory that attacks the problem of learning grammatical models from string samples. In other words, grammatical inference tries to identify the computational models that generate the sample strings. In recent decade, due to the explosion of information, efficient ways of searching and organizing are of great importance. Therefore, using grammatical inference methods to process information computationally is very interesting. In real applications, the space-cost plays an important role on performance. In this thesis, we address the problem of refining learning models in grammatical inference. For regular language learning, we introduce formally the notion of “Classification Automaton” that reduces model size by identifying one automaton for multiple string classes. Classification automaton is proved to reduce 30% model size from a straightforward multiple automata approach on house rent data obtained from the public folder in Microsoft Exchange Server of Nanyang Technological University. In real world applications, there is always a maximum possible length for the strings. Based on this observation, we further introduce cover automata, which simplified a learning model with a maximum length limit, for grammatical inference. Test results based on Splice-junction Gene Sequence database demonstrate the method reduces model size by 32% from the widely used deterministic finite automaton model. By mixed k-th order Markov Chains, stochastic information for all possible substrings within k+1 length is captured. However, the space cost is exponential. We introduce the use of recurrent neural networks (RNNs) and present a pruning learning method to avoid the exponential space costs. There is a tradeoff between the accuracy and model size. We proposed a balancing method and presented test results based on Splicejunction Gene Sequence database, which demonstrate that the method reduced the model size by 105 times with a reduced accuracy from 80% to 76%. Then, we introduce profile-based alignment learning (PBAL) framework to refine the model for context-free grammar learning. The PBAL framework is an extension of the existing Alignment Based Learning (ABL) framework by making use of statistical information to refine alignment and further refine the learned grammatical rules. The converged results rules are proved in experiments to improve the alignment precision from 50% to 90% with model size reduced to 47%. DOCTOR OF PHILOSOPHY (SCE) 2008-10-20T09:57:40Z 2008-10-20T09:57:40Z 2008 2008 Thesis Wang, X. (2008). Refining learning models in grammatical inference. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/13589 10.32657/10356/13589 en 187 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Theory of computation::Mathematical logic and formal languages |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Theory of computation::Mathematical logic and formal languages Wang, Xiangrui Refining learning models in grammatical inference |
description |
Grammatical inference is a branch of computational learning theory that attacks the problem of learning grammatical models from string samples. In other words, grammatical inference tries to identify the computational models that generate the sample strings. In recent decade, due to the explosion of information, efficient ways of searching and organizing are of great importance. Therefore, using grammatical inference methods to process information computationally is very interesting. In real applications, the space-cost plays an important role on performance. In this thesis, we address the problem of refining learning models in grammatical inference. For regular language learning, we introduce formally the notion of “Classification Automaton” that reduces model size by identifying one automaton for multiple string classes. Classification automaton is proved to reduce 30% model size from a straightforward multiple automata approach on house rent data obtained from the public folder in Microsoft Exchange Server of Nanyang Technological University. In real world applications, there is always a maximum possible length for the strings. Based on this observation, we further introduce cover automata, which simplified a learning model with a maximum length limit, for grammatical inference. Test results based on Splice-junction Gene Sequence database demonstrate the method reduces model size by 32% from the widely used deterministic finite automaton model. By mixed k-th order Markov Chains, stochastic information for all possible substrings within k+1 length is captured. However, the space cost is exponential. We introduce the use of recurrent neural networks (RNNs) and present a pruning learning method to avoid the exponential space costs. There is a tradeoff between the accuracy and model size. We proposed a balancing method and presented test results based on Splicejunction Gene Sequence database, which demonstrate that the method reduced the model size by 105 times with a reduced accuracy from 80% to 76%. Then, we introduce profile-based alignment learning (PBAL) framework to refine the model for context-free grammar learning. The PBAL framework is an extension of the existing Alignment Based Learning (ABL) framework by making use of statistical information to refine alignment and further refine the learned grammatical rules. The converged results rules are proved in experiments to improve the alignment precision from 50% to 90% with model size reduced to 47%. |
author2 |
Narendra Shivaji Chaudhari |
author_facet |
Narendra Shivaji Chaudhari Wang, Xiangrui |
format |
Theses and Dissertations |
author |
Wang, Xiangrui |
author_sort |
Wang, Xiangrui |
title |
Refining learning models in grammatical inference |
title_short |
Refining learning models in grammatical inference |
title_full |
Refining learning models in grammatical inference |
title_fullStr |
Refining learning models in grammatical inference |
title_full_unstemmed |
Refining learning models in grammatical inference |
title_sort |
refining learning models in grammatical inference |
publishDate |
2008 |
url |
https://hdl.handle.net/10356/13589 |
_version_ |
1759857605206867968 |