Context based patent classification and search : part A

This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, w...

Full description

Saved in:
Bibliographic Details
Main Author: Yoong, Jia Hui
Other Authors: Lihui CHEN
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/138634
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-138634
record_format dspace
spelling sg-ntu-dr.10356-1386342023-07-07T18:10:27Z Context based patent classification and search : part A Yoong, Jia Hui Lihui CHEN School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-11T05:38:36Z 2020-05-11T05:38:36Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138634 en A3049-191 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Yoong, Jia Hui
Context based patent classification and search : part A
description This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications.
author2 Lihui CHEN
author_facet Lihui CHEN
Yoong, Jia Hui
format Final Year Project
author Yoong, Jia Hui
author_sort Yoong, Jia Hui
title Context based patent classification and search : part A
title_short Context based patent classification and search : part A
title_full Context based patent classification and search : part A
title_fullStr Context based patent classification and search : part A
title_full_unstemmed Context based patent classification and search : part A
title_sort context based patent classification and search : part a
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/138634
_version_ 1772825203012796416