INSTAGRAM ONLINE SHOP’S COMMENT CLASSIFICATION USING STATISTICAL APPROACH

Instagram is one of the most popular social media today. Some online store owners show their goods through Instagram and use the Instagram comment section to interact with buyers. Many number of comments and high intensity of posts can cause a problem for online store owners. The existence of an aut...

Full description

Saved in:
Bibliographic Details
Main Author: Prabowo (NIM : 13513094), Faisal
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/22052
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Instagram is one of the most popular social media today. Some online store owners show their goods through Instagram and use the Instagram comment section to interact with buyers. Many number of comments and high intensity of posts can cause a problem for online store owners. The existence of an automated natural language classification system can certainly help online store owners in responding to any incoming comments. For that, this final project is conducted to find the best method and algorithm to classify Instagram comment, especially in case of online store, by statistical approach. <br /> <br /> <br /> In general, this final project is divided into 4 parts, namely pre-process, feature extraction, feature selection and classification. The used preprocesses are basic pre-processing, URL conversion, hashtags conversion, mention conversion, punctuation conversion, emoticons conversion, numbers conversion, regional names conversion, word formalization and stop-word removal. Feature extraction is attempted using unigram and word embedding as a feature representation. The feature selection is also done with the information gain algorithm. The classification uses Naïve Bayes, Decision Tree, Support Vector Machine and Convolutional Neural Network as learning algorithms. <br /> <br /> <br /> Experiments carried out in this final project are baseline experiments, pre-process experiments, feature selection experiments, word embedding experiments and CNN experiments. The data used in the experiment was 2810 Instagram comments with labels representing the responses provided which are "Answered", "Read", and "Ignored". Word embedding models are built using 5504 comment data without labels. Baseline experiments with unigram features and SVM learning algorithms resulted in 73.24% accuracy. The addition of preprocesses to pre-process experiments increases the accuracy to 82.42%. The addition of feature selection to feature selection experiments also increased the accuracy to 84.09%. The use of word embedding and SVM learning algorithms in word embedding experiment yielded 81.67% accuracy. The best experimental result was obtained from CNN experiments with word embedding feature and CNN learning algorithm with an accuracy of 84.24%. This model can still be improved by solving reply-response comments problem, improving the quality of experimental data, handling product names, and optimizing the pre-process.