Open-set pattern recognition and its application in information extraction from text
In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164148 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-164148 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1641482023-07-04T17:51:07Z Open-set pattern recognition and its application in information extraction from text Ke, Yizhen Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category. The open-set classifier is designed to classify known samples accurately and reject unrelated samples. However, there are fewer applications in text classification. The goal of this paper is to achieve the application of open-set recognition on text classification tasks. This paper first reviews the work related to text classification and open-set classification identification. Subsequently, this paper determines the use of GloVe technique to map files to vector space. Considering that CNN and LSTM are superior in text classification, this article conducted a preliminary experiment and selected CNN with better performance as the base model. On this basis, SVDD and OpenMax methods are used in the 10 domains and 20 domains of the data set, respectively, and are compared with existing text classifiers. SVDD has similar training results to the currently open-set classifier based on SVM. The performance of OpenMax in the text classifier does not greatly vibrate by Openness and has good accuracy. Master of Science (Computer Control and Automation) 2023-01-06T07:16:40Z 2023-01-06T07:16:40Z 2022 Thesis-Master by Coursework Ke, Y. (2022). Open-set pattern recognition and its application in information extraction from text. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164148 https://hdl.handle.net/10356/164148 en ISM-DISS-03094 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Ke, Yizhen Open-set pattern recognition and its application in information extraction from text |
description |
In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category. The open-set classifier is designed to classify known samples accurately and reject unrelated samples. However, there are fewer applications in text classification. The goal of this paper is to achieve the application of open-set recognition on text classification tasks.
This paper first reviews the work related to text classification and open-set classification identification. Subsequently, this paper determines the use of GloVe technique to map files to vector space. Considering that CNN and LSTM are superior in text classification, this article conducted a preliminary experiment and selected CNN with better performance as the base model. On this basis, SVDD and OpenMax methods are used in the 10 domains and 20 domains of the data set, respectively, and are compared with existing text classifiers. SVDD has similar training results to the currently open-set classifier based on SVM. The performance of OpenMax in the text classifier does not greatly vibrate by Openness and has good accuracy. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Ke, Yizhen |
format |
Thesis-Master by Coursework |
author |
Ke, Yizhen |
author_sort |
Ke, Yizhen |
title |
Open-set pattern recognition and its application in information extraction from text |
title_short |
Open-set pattern recognition and its application in information extraction from text |
title_full |
Open-set pattern recognition and its application in information extraction from text |
title_fullStr |
Open-set pattern recognition and its application in information extraction from text |
title_full_unstemmed |
Open-set pattern recognition and its application in information extraction from text |
title_sort |
open-set pattern recognition and its application in information extraction from text |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/164148 |
_version_ |
1772826649131220992 |