SISTEM PENDETEKSI DINI TRANSLATED PLAGIARISM PADA DOKUMEN DIGITAL
The use of Internet applications, which have already crossed the language border, caused a serious problem such as translated plagiarism. In academic institutions, translated plagiarism is found in various cases, such as: theses, final projects, and papers. In this thesis, we propose an early detect...
Saved in:
Main Authors: | , |
---|---|
Format: | Theses and Dissertations NonPeerReviewed |
Published: |
[Yogyakarta] : Universitas Gadjah Mada
2011
|
Subjects: | |
Online Access: | https://repository.ugm.ac.id/90304/ http://etd.ugm.ac.id/index.php?mod=penelitian_detail&sub=PenelitianDetail&act=view&typ=html&buku_id=52485 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universitas Gadjah Mada |
Summary: | The use of Internet applications, which have already crossed the language
border, caused a serious problem such as translated plagiarism. In academic
institutions, translated plagiarism is found in various cases, such as: theses, final
projects, and papers. In this thesis, we propose an early detection system for
translated plagiarism (Indonesian-English) on digital document which based on
the revised version of sentence-based detection algorithm. This algorithm is a
modified version of the sentence-based detection algorithm. The proposed system
works as follows: (i) translating the input document using the Google Translate
API component, (ii) searching for PDF documents that are similar to the
translated document on WWW repository using the Google AJAX Search API
component. If it is found, (iii) the system will download these documents, then
(iv) does some preprocessing steps, such as: removing punctuation, removing
numbers, removing stopwords, removing repeated words, and doing a process
called lemmatization of words. The last process (v) is to compare the content of
translated document against downloaded documents. To compare the accuracy of
detection, we built two systems: (i) the first system based on sentence-based
detection algorithm and (ii) a second system based on the revised version of
sentence-based detection algorithm, and then tested both systems by using the
same datasets (25 datasets). We evaluate the accuracy of both systems by using
RMSE metric and the t test as the basis for comparison. The results showed that
there was a significant difference in accuracy between the two systems, where the
system based on the revised version of sentence-based detection algorithm
(RMSE=24,95%) is more accurate than the system based on sentence-based
detection algorithm (RMSE=38,54%). |
---|