#TITLE_ALTERNATIVE#

The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assump...

Full description

Saved in:
Bibliographic Details
Main Author: CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/30513
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:30513
spelling id-itb.:305132018-07-03T15:33:42Z#TITLE_ALTERNATIVE# CHANDRA RAJAGUKGUK (NIM : 13514082), RIO Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30513 The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score.
format Final Project
author CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
spellingShingle CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
#TITLE_ALTERNATIVE#
author_facet CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
author_sort CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
title #TITLE_ALTERNATIVE#
title_short #TITLE_ALTERNATIVE#
title_full #TITLE_ALTERNATIVE#
title_fullStr #TITLE_ALTERNATIVE#
title_full_unstemmed #TITLE_ALTERNATIVE#
title_sort #title_alternative#
url https://digilib.itb.ac.id/gdl/view/30513
_version_ 1822267477095088128