AUTOMATIC PARAPHRASING FOR INDONESIAN LANGUAGE USING SIMULATED ANNEALING
Paraphrasing is a technique of processing information by changing the form of the text without changing its meaning. The system for automatic paraphrasing generation for Indonesian language that has been developed uses a rule-based approach, but its use is still limited to sentence with defined r...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56232 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Paraphrasing is a technique of processing information by changing the form of the
text without changing its meaning. The system for automatic paraphrasing
generation for Indonesian language that has been developed uses a rule-based
approach, but its use is still limited to sentence with defined rules. This study uses
an unsupervised approach with the Simulated Annealing algorithm adapted from
Unsupervised Paraphrasing by Simulated Annealing system. Paraphrase
candidates are generated by doing local editing. The acceptance probability of
candidate is based on the objective function value which is a linear combination of
semantic preservation score, diversity of language expressions score, and fluency
score.
Adaptation for Indonesian is done by changing language-specific resources. These
resources included a language models for fluency score calculation, a dictionary,
word embedding, and a stopword list used to extract keywords. In addition, this
study also implementing modification by changing the implementation of Hill
Climbing in determining word candidate in generate candidate process and using
an Indonesian thesaurus to obtain synonyms for the words on replacement.
Based on the experimental results, it was found that the modified algorithm using
Indonesian thesaurus obtained the best results in terms of the number of sentences
that were successfully paraphrased and in terms of similarities to the original
sentence, compared to original adaptation of UPSA and algorithm with
modification of the Hill Climbing implementation. |
---|