Context based patent classification and search : part A

This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, w...

Full description

Saved in:

Bibliographic Details
Main Author:	Yoong, Jia Hui
Other Authors:	Lihui CHEN
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Online Access:	https://hdl.handle.net/10356/138634
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-138634
record_format	dspace
spelling	sg-ntu-dr.10356-1386342023-07-07T18:10:27Z Context based patent classification and search : part A Yoong, Jia Hui Lihui CHEN School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-11T05:38:36Z 2020-05-11T05:38:36Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138634 en A3049-191 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Document and text processing Yoong, Jia Hui Context based patent classification and search : part A
description	This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications.
author2	Lihui CHEN
author_facet	Lihui CHEN Yoong, Jia Hui
format	Final Year Project
author	Yoong, Jia Hui
author_sort	Yoong, Jia Hui
title	Context based patent classification and search : part A
title_short	Context based patent classification and search : part A
title_full	Context based patent classification and search : part A
title_fullStr	Context based patent classification and search : part A
title_full_unstemmed	Context based patent classification and search : part A
title_sort	context based patent classification and search : part a
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/138634
_version_	1772825203012796416

Context based patent classification and search : part A

Similar Items