#TITLE_ALTERNATIVE#

The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assump...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
التنسيق:	Final Project
اللغة:	Indonesia
الوصول للمادة أونلاين:	https://digilib.itb.ac.id/gdl/view/30513
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Institut Teknologi Bandung
اللغة:	Indonesia

id	id-itb.:30513
spelling	id-itb.:305132018-07-03T15:33:42Z#TITLE_ALTERNATIVE# CHANDRA RAJAGUKGUK (NIM : 13514082), RIO Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30513 The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score.
format	Final Project
author	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
spellingShingle	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO #TITLE_ALTERNATIVE#
author_facet	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
author_sort	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
title	#TITLE_ALTERNATIVE#
title_short	#TITLE_ALTERNATIVE#
title_full	#TITLE_ALTERNATIVE#
title_fullStr	#TITLE_ALTERNATIVE#
title_full_unstemmed	#TITLE_ALTERNATIVE#
title_sort	#title_alternative#
url	https://digilib.itb.ac.id/gdl/view/30513
_version_	1823636631091937280

#TITLE_ALTERNATIVE#

مواد مشابهة