#TITLE_ALTERNATIVE#
The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br /> <br /> represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br /> <br /> text pieces. However, the number representation raises assump...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/30513 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The semantic similarity between two pieces of text (STS: Semantic Textual Similarity) can be <br />
<br />
represented by a number. STS aims to measure the degree of semantic similarity to the pair of <br />
<br />
text pieces. However, the number representation raises assumption to give semantic meaning <br />
<br />
of the sentence, no more specific explanation based on the number of similarities. From this <br />
<br />
problem requires an explanation of why the two sentences are to be similar or otherwise. <br />
<br />
Interpretable Semantic Textual Similarity (iSTS) is a task that give those needs, to explain the <br />
<br />
semantic resemblance of two sentences. The output of iSTS is a pair of chunks with its relation <br />
<br />
scores and labels. The score ranged 0 to 5, while the labels are EQUI, SPE1, SPE2, OPPO, <br />
<br />
REL, SIMI, and NOALI. However, the iSTS corpus is currently limited to only a few languages, <br />
<br />
excluding Indonesian. The purpose of this final project is to build iSTS model based on <br />
<br />
Indonesian language corpus and also the corpus is built on this final project. <br />
<br />
In this final project, two best current iSTS techniques for English are VRep and UWB. The <br />
<br />
VRep technique uses WordNet to represent word semantics, while UWB uses word embeeding. <br />
<br />
Both of these techniques use the same general step of preprocessing, feature extraction, and <br />
<br />
classification. The difference between these two techniques lies in slightly different <br />
<br />
preprocesses and unique feature extraction methods. Adaptation of VRep and UWB in this final <br />
<br />
project is done in the preprocessing stage, feature extraction, and classification with four <br />
<br />
machine learning techniques used decision tree, SVM, random forest, and multilayer <br />
<br />
perceptron. <br />
<br />
Based on F1 evaluation on the type, score, and type + score aspect, the best iSTS model in <br />
<br />
VRep technique is SVM for the type aspect with F1 test of 0.7037, decision tree for score aspect <br />
<br />
with F1 test 0.8770 and SVM for score + F1 test 0.6821. While in UWB obtained the best <br />
<br />
decision tree iSTS model on aspect type with F1 test 0.6869, desicion tree on aspect score with <br />
<br />
F1 test 0.8886, and SVM for aspect type + score with F1 test 0.6821. In this final project, VRep <br />
<br />
becomes the best model for aspect type and score, UWB for aspect type + score. |
---|