Context based patent classification and search : part A
This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, w...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138634 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-138634 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1386342023-07-07T18:10:27Z Context based patent classification and search : part A Yoong, Jia Hui Lihui CHEN School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-11T05:38:36Z 2020-05-11T05:38:36Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138634 en A3049-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing Yoong, Jia Hui Context based patent classification and search : part A |
description |
This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model.
For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested.
The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results.
In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications. |
author2 |
Lihui CHEN |
author_facet |
Lihui CHEN Yoong, Jia Hui |
format |
Final Year Project |
author |
Yoong, Jia Hui |
author_sort |
Yoong, Jia Hui |
title |
Context based patent classification and search : part A |
title_short |
Context based patent classification and search : part A |
title_full |
Context based patent classification and search : part A |
title_fullStr |
Context based patent classification and search : part A |
title_full_unstemmed |
Context based patent classification and search : part A |
title_sort |
context based patent classification and search : part a |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/138634 |
_version_ |
1772825203012796416 |