Implementing semantic search for textual data in web applications

Semantic search, also known as vector search, retrieves data based on their semantic similarity. It is enabled by sentence embeddings, which are high-dimension vectors that encapsulate the semantic meaning of sentences. Compared to traditional keyword search, semantic search accounts for the true in...

全面介紹

Saved in:
書目詳細資料
主要作者: Toh, Jeremy Gen Yang
其他作者: Andy Khong W H
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/177123
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Semantic search, also known as vector search, retrieves data based on their semantic similarity. It is enabled by sentence embeddings, which are high-dimension vectors that encapsulate the semantic meaning of sentences. Compared to traditional keyword search, semantic search accounts for the true intent of user queries which keyword search struggles to capture. This paper explores the implementation of semantic search in web applications by processing sentences using Sentence-BERT (SBERT), which is a pre-trained deep learning language model for generating meaningful, high-dimension vectors called sentence embeddings. These embeddings are then stored in PostgreSQL database with an extension, vector, which enables efficient similarity comparisons during search queries. This work details the findings from developing a vector search system, integrating it with a web application, and deploying it online.