Exploring language model for better semantic matching of text paragraphs

Natural Language Processing (NLP) has come a long way and modern NLP study has catapulted the proliferate use of NLP incorporated into our everyday lives for short texts. However, the same cannot be said for long text sentence documents. Some current NLP models work well for short texts but suf...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Ng, Kwang Sheng
مؤلفون آخرون:	Lihui Chen
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2022
الموضوعات:	Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Electrical and electronic engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/157772
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-157772
record_format	dspace
spelling	sg-ntu-dr.10356-1577722023-07-07T19:04:02Z Exploring language model for better semantic matching of text paragraphs Ng, Kwang Sheng Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Electrical and electronic engineering Natural Language Processing (NLP) has come a long way and modern NLP study has catapulted the proliferate use of NLP incorporated into our everyday lives for short texts. However, the same cannot be said for long text sentence documents. Some current NLP models work well for short texts but suffer when the length of the text increases in size, processing time growing in exponential time with poor results. In recent times, state-of-the-art (SOTA) BERT NLP model propelled existing work forward significantly with their approach. New methods such as Sentence-BERT (SBERT) or Simple Contrasting Learning (SimCSE), basing their work of BERT, experimented and achieved similar outcome as BERT. This report aims to learn how effective the two new models are. In this project, the two models will be put to the test with a patent dataset available online, ‘PatentMatch’ that consist of patent claims and when tested out by the PatentMatch team with the SOTA BERT only managed to achieve 54% accuracy. Utilising pretrained models from SBERT and SimCSE, the PatentMatch test balanced dataset was tested with training and without training to learn how the average cosine similarity score would change and how the models will perform. The experiment was replicated several times with different parameters set. The output from the 2 models varies with the pretrained models used, with models having an accuracy rate around the same as BERT model but was done so at a much quicker time. F1 score for both models look promising with some fine-tuned pretrained models scoring around 66% with quite a high precision and recall score. Both models have the potential to perform even better but a better and more complex pretrained model will need to be used for them to shine. Bachelor of Engineering (Information Engineering and Media) 2022-05-23T05:40:35Z 2022-05-23T05:40:35Z 2022 Final Year Project (FYP) Ng, K. S. (2022). Exploring language model for better semantic matching of text paragraphs. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157772 https://hdl.handle.net/10356/157772 en A3049-211 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Electrical and electronic engineering Ng, Kwang Sheng Exploring language model for better semantic matching of text paragraphs
description	Natural Language Processing (NLP) has come a long way and modern NLP study has catapulted the proliferate use of NLP incorporated into our everyday lives for short texts. However, the same cannot be said for long text sentence documents. Some current NLP models work well for short texts but suffer when the length of the text increases in size, processing time growing in exponential time with poor results. In recent times, state-of-the-art (SOTA) BERT NLP model propelled existing work forward significantly with their approach. New methods such as Sentence-BERT (SBERT) or Simple Contrasting Learning (SimCSE), basing their work of BERT, experimented and achieved similar outcome as BERT. This report aims to learn how effective the two new models are. In this project, the two models will be put to the test with a patent dataset available online, ‘PatentMatch’ that consist of patent claims and when tested out by the PatentMatch team with the SOTA BERT only managed to achieve 54% accuracy. Utilising pretrained models from SBERT and SimCSE, the PatentMatch test balanced dataset was tested with training and without training to learn how the average cosine similarity score would change and how the models will perform. The experiment was replicated several times with different parameters set. The output from the 2 models varies with the pretrained models used, with models having an accuracy rate around the same as BERT model but was done so at a much quicker time. F1 score for both models look promising with some fine-tuned pretrained models scoring around 66% with quite a high precision and recall score. Both models have the potential to perform even better but a better and more complex pretrained model will need to be used for them to shine.
author2	Lihui Chen
author_facet	Lihui Chen Ng, Kwang Sheng
format	Final Year Project
author	Ng, Kwang Sheng
author_sort	Ng, Kwang Sheng
title	Exploring language model for better semantic matching of text paragraphs
title_short	Exploring language model for better semantic matching of text paragraphs
title_full	Exploring language model for better semantic matching of text paragraphs
title_fullStr	Exploring language model for better semantic matching of text paragraphs
title_full_unstemmed	Exploring language model for better semantic matching of text paragraphs
title_sort	exploring language model for better semantic matching of text paragraphs
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/157772
_version_	1772825491832569856

Exploring language model for better semantic matching of text paragraphs

مواد مشابهة