Composition distillation for semantic sentence embeddings

The increasing demand for Natural Language Processing (NLP) solutions is driven by an exponential growth in digital content, communication platforms, and the undeniable need for sophisticated language understanding. This surge in demand also reflects the critical role of NLP in enabling machines to...

Full description

Saved in:
Bibliographic Details
Main Author: Vaanavan, Sezhiyan
Other Authors: Lihui Chen
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
NLP
LLM
Online Access:https://hdl.handle.net/10356/177524
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-177524
record_format dspace
spelling sg-ntu-dr.10356-1775242024-05-31T15:44:34Z Composition distillation for semantic sentence embeddings Vaanavan, Sezhiyan Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering NLP LLM Sentence embedding The increasing demand for Natural Language Processing (NLP) solutions is driven by an exponential growth in digital content, communication platforms, and the undeniable need for sophisticated language understanding. This surge in demand also reflects the critical role of NLP in enabling machines to comprehend, interpret, and generate human-like text, which makes it a crucial technology in modern AI applications. Semantics, the study of meaning in languages, plays a pivotal role in NLP, encompassing the understanding of context, relationships, and nuances within multiple textual data. In recent years, there has been remarkable progress in utilizing pre-trained language models like BERT (Bidirectional Encoder Representations from Transformers) and GPT-3/4 (Generative Pre-trained Transformer) for semantic embeddings in NLP tasks. This project identifies and addresses a critical challenge within NLP that is commonly overlooked. The intricate composition of semantics within sentences often gets lost during model training, resulting in a lack of depth and precision in understanding the input language, leading to potential misinterpretations of textual data. This gap is hence addressed by enhancing already existing methods to distil semantic information from texts into smaller and more efficient models. By building upon the foundation laid by previous models, this project aims to improve the performance and accuracy of NLP systems by enhancing the quality and depth of semantic embeddings. Bachelor's degree 2024-05-29T02:21:57Z 2024-05-29T02:21:57Z 2024 Final Year Project (FYP) Vaanavan, S. (2024). Composition distillation for semantic sentence embeddings. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/177524 https://hdl.handle.net/10356/177524 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
NLP
LLM
Sentence embedding
spellingShingle Engineering
NLP
LLM
Sentence embedding
Vaanavan, Sezhiyan
Composition distillation for semantic sentence embeddings
description The increasing demand for Natural Language Processing (NLP) solutions is driven by an exponential growth in digital content, communication platforms, and the undeniable need for sophisticated language understanding. This surge in demand also reflects the critical role of NLP in enabling machines to comprehend, interpret, and generate human-like text, which makes it a crucial technology in modern AI applications. Semantics, the study of meaning in languages, plays a pivotal role in NLP, encompassing the understanding of context, relationships, and nuances within multiple textual data. In recent years, there has been remarkable progress in utilizing pre-trained language models like BERT (Bidirectional Encoder Representations from Transformers) and GPT-3/4 (Generative Pre-trained Transformer) for semantic embeddings in NLP tasks. This project identifies and addresses a critical challenge within NLP that is commonly overlooked. The intricate composition of semantics within sentences often gets lost during model training, resulting in a lack of depth and precision in understanding the input language, leading to potential misinterpretations of textual data. This gap is hence addressed by enhancing already existing methods to distil semantic information from texts into smaller and more efficient models. By building upon the foundation laid by previous models, this project aims to improve the performance and accuracy of NLP systems by enhancing the quality and depth of semantic embeddings.
author2 Lihui Chen
author_facet Lihui Chen
Vaanavan, Sezhiyan
format Final Year Project
author Vaanavan, Sezhiyan
author_sort Vaanavan, Sezhiyan
title Composition distillation for semantic sentence embeddings
title_short Composition distillation for semantic sentence embeddings
title_full Composition distillation for semantic sentence embeddings
title_fullStr Composition distillation for semantic sentence embeddings
title_full_unstemmed Composition distillation for semantic sentence embeddings
title_sort composition distillation for semantic sentence embeddings
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/177524
_version_ 1800916109243711488