#TITLE_ALTERNATIVE#

The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assump...

Full description

Saved in:

Bibliographic Details
Main Author:	CHANDRA RAJAGUKGUK (NIM : 13514082), RIO
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/30513
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assumption to give semantic meaning <br /> <br /> of the sentence, no more specific explanation based on the number of similarities. From this <br /> <br /> problem requires an explanation of why the two sentences are to be similar or otherwise. <br /> <br /> Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br /> <br /> semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br /> <br /> scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br /> <br /> REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br /> <br /> excluding Indonesian. The purpose of this final project is to build iSTS model based on <br /> <br /> Indonesian language corpus and also the corpus is built on this final project. <br /> <br /> In this final project, two best current iSTS techniques for English are VRep and UWB. The <br /> <br /> VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br /> <br /> Both of these techniques use the same general step of preprocessing, feature extraction, and <br /> <br /> classification. The difference between these two techniques lies in slightly different <br /> <br /> preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br /> <br /> project is done in the preprocessing stage, feature extraction, and classification with four <br /> <br /> machine learning techniques used decision tree, SVM, random forest, and multilayer <br /> <br /> perceptron. <br /> <br /> Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br /> <br /> VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br /> <br /> with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br /> <br /> decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br /> <br /> F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br /> <br /> becomes the best model for aspect type and score, UWB for aspect type + score.

#TITLE_ALTERNATIVE#

Similar Items