Context based patent classification and search : part A

This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, w...

全面介紹

Saved in:

書目詳細資料
主要作者:	Yoong, Jia Hui
其他作者:	Lihui CHEN
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2020
主題:	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
在線閱讀:	https://hdl.handle.net/10356/138634
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	This research project aims to develop a Transformer-based multi-label classifier for the classification of patent categories, where Natural Language Processing (NLP) will be used. However, the way language is primarily used in patents is extremely complex as compared to everyday writing or speech, which calls for a need to modify state-ofthe-art Transformer-based models such as BERT before they can be applied effectively to the classification framework. As such, this project will cover different methods of developing the classifier model using NLP and evaluate which segments of a patent work best in training the model. For this project, a multi-label classification model is developed to predict the categories that a patent would fall under. This report is a summary of the usage of finetuning the XLNet and ALBERT pre-trained models using different components from a custom dataset obtained from patent text, and a comparative analysis of the different pre-processing methods and models tested. The first approach was to test out the model’s accuracy when fine-tuned on different segments of a patent. From the results obtained, it can be concluded that the description segment holds the most promise when upscaling the model and dataset. The second approach was using both the abstract and claims segment of a patent. While there were no significant improvements, it is worth noting that the model could handle a larger variety of inputs for a more reliable classification output. The last approach attempted to fine-tune the model by merging the last hidden states of the model output from both abstract and claims segments of a patent. However, this method proved ineffective and did not have any significant results. In summary, the best scores in the empirical study achieved a score of 95.3% accuracy for one in top three prediction in the main group level, and 58.1% for the sub-group level of Patent IPC classifications.

Context based patent classification and search : part A

相似書籍