Open-set pattern recognition and its application in information extraction from text
In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164148 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In traditional supervised learning, the training set contains the same classes that appear in the testing set. However, the classifier may encounter previously unseen classes in the actual world, which is likely to create errors if a close-set classifier divides these data into the original category. The open-set classifier is designed to classify known samples accurately and reject unrelated samples. However, there are fewer applications in text classification. The goal of this paper is to achieve the application of open-set recognition on text classification tasks.
This paper first reviews the work related to text classification and open-set classification identification. Subsequently, this paper determines the use of GloVe technique to map files to vector space. Considering that CNN and LSTM are superior in text classification, this article conducted a preliminary experiment and selected CNN with better performance as the base model. On this basis, SVDD and OpenMax methods are used in the 10 domains and 20 domains of the data set, respectively, and are compared with existing text classifiers. SVDD has similar training results to the currently open-set classifier based on SVM. The performance of OpenMax in the text classifier does not greatly vibrate by Openness and has good accuracy. |
---|