SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK
Semantic Textual Similarity (STS) is a task in natural language processing that deals with determining how similar two sentences are. STS is a very important component in solving other natural language processing tasks such as semantic search, summarization, question answering, plagiarism detecti...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/54226 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:54226 |
---|---|
spelling |
id-itb.:542262021-03-15T14:24:50ZSEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK Baptiso Sorlawan, Agung Indonesia Final Project Semantic Textual Similarity, Siamese Neural Network, encoder, pooling, objective function INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/54226 Semantic Textual Similarity (STS) is a task in natural language processing that deals with determining how similar two sentences are. STS is a very important component in solving other natural language processing tasks such as semantic search, summarization, question answering, plagiarism detection, and information extraction. One of the architecture which is the focus of this research that can be used to model STS is Siamese Neural Network (SNN). One of the most important components in STS is the encoder. The encoder maps sentences to numerical vectors. In this research, experiments are being made to various kinds of SNN encoder. Other than that, experiments also being done to the other components of SNN as well, which are the pooling layer and the objective function. The dataset used in these experiments is acquired from Prosa.ai which contains frequently asked questions (FAQ) sentences. From the experiments, the best STS model achieves an f1score of 0,9723 which is better than the baseline. That model is SNN with IndoBERT encoder, MEAN + CLS pooling and regression objective function. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Semantic Textual Similarity (STS) is a task in natural language processing that
deals with determining how similar two sentences are. STS is a very important
component in solving other natural language processing tasks such as semantic
search, summarization, question answering, plagiarism detection, and information
extraction. One of the architecture which is the focus of this research that can be
used to model STS is Siamese Neural Network (SNN).
One of the most important components in STS is the encoder. The encoder maps
sentences to numerical vectors. In this research, experiments are being made to
various kinds of SNN encoder. Other than that, experiments also being done to the
other components of SNN as well, which are the pooling layer and the objective
function.
The dataset used in these experiments is acquired from Prosa.ai which contains
frequently asked questions (FAQ) sentences. From the experiments, the best STS
model achieves an f1score of 0,9723 which is better than the baseline. That model
is SNN with IndoBERT encoder, MEAN + CLS pooling and regression objective
function.
|
format |
Final Project |
author |
Baptiso Sorlawan, Agung |
spellingShingle |
Baptiso Sorlawan, Agung SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
author_facet |
Baptiso Sorlawan, Agung |
author_sort |
Baptiso Sorlawan, Agung |
title |
SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
title_short |
SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
title_full |
SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
title_fullStr |
SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
title_full_unstemmed |
SEMANTIC TEXTUAL SIMILARITY (STS) FOR INDONESIAN SENTENCE USING SIAMESE NEURAL NETWORK |
title_sort |
semantic textual similarity (sts) for indonesian sentence using siamese neural network |
url |
https://digilib.itb.ac.id/gdl/view/54226 |
_version_ |
1822929549837467648 |