Open-set pattern recognition and its application in information extraction from text

In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category...

Full description

Saved in:
Bibliographic Details
Main Author: Ke, Yizhen
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164148
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164148
record_format dspace
spelling sg-ntu-dr.10356-1641482023-07-04T17:51:07Z Open-set pattern recognition and its application in information extraction from text Ke, Yizhen Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category. The open-set classifier is designed to classify known samples accurately and reject unrelated samples. However, there are fewer applications in text classification. The goal of this paper is to achieve the application of open-set recognition on text classification tasks. This paper first reviews the work related to text classification and open-set classification identification. Subsequently, this paper determines the use of GloVe technique to map files to vector space. Considering that CNN and LSTM are superior in text classification, this article conducted a preliminary experiment and selected CNN with better performance as the base model. On this basis, SVDD and OpenMax methods are used in the 10 domains and 20 domains of the data set, respectively, and are compared with existing text classifiers. SVDD has similar training results to the currently open-set classifier based on SVM. The performance of OpenMax in the text classifier does not greatly vibrate by Openness and has good accuracy. Master of Science (Computer Control and Automation) 2023-01-06T07:16:40Z 2023-01-06T07:16:40Z 2022 Thesis-Master by Coursework Ke, Y. (2022). Open-set pattern recognition and its application in information extraction from text. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164148 https://hdl.handle.net/10356/164148 en ISM-DISS-03094 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Ke, Yizhen
Open-set pattern recognition and its application in information extraction from text
description In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category. The open-set classifier is designed to classify known samples accurately and reject unrelated samples. However, there are fewer applications in text classification. The goal of this paper is to achieve the application of open-set recognition on text classification tasks. This paper first reviews the work related to text classification and open-set classification identification. Subsequently, this paper determines the use of GloVe technique to map files to vector space. Considering that CNN and LSTM are superior in text classification, this article conducted a preliminary experiment and selected CNN with better performance as the base model. On this basis, SVDD and OpenMax methods are used in the 10 domains and 20 domains of the data set, respectively, and are compared with existing text classifiers. SVDD has similar training results to the currently open-set classifier based on SVM. The performance of OpenMax in the text classifier does not greatly vibrate by Openness and has good accuracy.
author2 Mao Kezhi
author_facet Mao Kezhi
Ke, Yizhen
format Thesis-Master by Coursework
author Ke, Yizhen
author_sort Ke, Yizhen
title Open-set pattern recognition and its application in information extraction from text
title_short Open-set pattern recognition and its application in information extraction from text
title_full Open-set pattern recognition and its application in information extraction from text
title_fullStr Open-set pattern recognition and its application in information extraction from text
title_full_unstemmed Open-set pattern recognition and its application in information extraction from text
title_sort open-set pattern recognition and its application in information extraction from text
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164148
_version_ 1772826649131220992