#TITLE_ALTERNATIVE#
The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assump...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/30513 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:30513 |
---|---|
spelling |
id-itb.:305132018-07-03T15:33:42Z#TITLE_ALTERNATIVE# CHANDRA RAJAGUKGUK (NIM : 13514082), RIO Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30513 The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br />
<br />
represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br />
<br />
text pieces. However, the number representation raises assumption to give semantic meaning <br />
<br />
of the sentence, no more specific explanation based on the number of similarities. From this <br />
<br />
problem requires an explanation of why the two sentences are to be similar or otherwise. <br />
<br />
Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br />
<br />
semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br />
<br />
scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br />
<br />
REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br />
<br />
excluding Indonesian. The purpose of this final project is to build iSTS model based on <br />
<br />
Indonesian language corpus and also the corpus is built on this final project. <br />
<br />
In this final project, two best current iSTS techniques for English are VRep and UWB. The <br />
<br />
VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br />
<br />
Both of these techniques use the same general step of preprocessing, feature extraction, and <br />
<br />
classification. The difference between these two techniques lies in slightly different <br />
<br />
preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br />
<br />
project is done in the preprocessing stage, feature extraction, and classification with four <br />
<br />
machine learning techniques used decision tree, SVM, random forest, and multilayer <br />
<br />
perceptron. <br />
<br />
Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br />
<br />
VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br />
<br />
with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br />
<br />
decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br />
<br />
F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br />
<br />
becomes the best model for aspect type and score, UWB for aspect type + score. |
format |
Final Project |
author |
CHANDRA RAJAGUKGUK (NIM : 13514082), RIO |
spellingShingle |
CHANDRA RAJAGUKGUK (NIM : 13514082), RIO #TITLE_ALTERNATIVE# |
author_facet |
CHANDRA RAJAGUKGUK (NIM : 13514082), RIO |
author_sort |
CHANDRA RAJAGUKGUK (NIM : 13514082), RIO |
title |
#TITLE_ALTERNATIVE# |
title_short |
#TITLE_ALTERNATIVE# |
title_full |
#TITLE_ALTERNATIVE# |
title_fullStr |
#TITLE_ALTERNATIVE# |
title_full_unstemmed |
#TITLE_ALTERNATIVE# |
title_sort |
#title_alternative# |
url |
https://digilib.itb.ac.id/gdl/view/30513 |
_version_ |
1822267477095088128 |