CLASSIFICATION AND CLUSTERING TO IDENTIFY SPOKEN DIALECTS IN INDONESIAN

In this thesis, the research is done with Support Vector Machines to identify eight spoken dialects in Indonesian. Those eight dialects are chosen based on previous research, they are Aceh, Bali, Batak, Betawi, Jawa, Minangkabau, Sulawesi, and Sunda dialects. <br /> <br /> <br /&...

Full description

Saved in:
Bibliographic Details
Main Author: IBRAHIM, JACQUELINE
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/22664
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:In this thesis, the research is done with Support Vector Machines to identify eight spoken dialects in Indonesian. Those eight dialects are chosen based on previous research, they are Aceh, Bali, Batak, Betawi, Jawa, Minangkabau, Sulawesi, and Sunda dialects. <br /> <br /> <br /> <br /> <br /> Spoken data is from speaker who lives in Bandung. In other note, the dialect that is heard has probability to be not so clear due to effect from environment. Spoken data then is being segmented to 4 seconds each. Then, it is being extracted for MFCC, spectral flux, and spectral centroid feature. That data in ARFF format then is being added by dialect attribute as label to its dialect. <br /> <br /> <br /> <br /> <br /> Experiment and testing then is being held with all-at-once and one-against-one technique. The kernel function that is used is linear kernel. The highest average result is given by one-against-one technique and with MFCC, spectral flux, and spectral centroid feature, that is 55%. On the other hand, with MFCC feature only, the result is lower, that is 53,5%. That being said, the used of three features is better than only MFCC feature. <br />