SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION

Nowadays, information can be found from various sources, social media for instance. Social media provides massive scale information which can be used for many purposes. The main issue is how to tranform this information into knowledge. In this research, information was processed to detect current to...

Full description

Saved in:
Bibliographic Details
Main Author: Veronica Claudia Muljana, Maria
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/47981
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:47981
spelling id-itb.:479812020-06-25T01:38:32ZSINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION Veronica Claudia Muljana, Maria Indonesia Theses Single Pass Fuzzy Means, clustering, bag of words INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47981 Nowadays, information can be found from various sources, social media for instance. Social media provides massive scale information which can be used for many purposes. The main issue is how to tranform this information into knowledge. In this research, information was processed to detect current topics in social media. Topic detection can be done using several methods, one of them is clustering. This research focus on increasing the efficiency of clustering algorithm for text data. Text data is retrieved from Twitter and processed using clustering algorithm to produce bags of words, where each bag of words is cluster center and represents a particular topic. Proposed algorithm, Single Pass Fuzzy Means, is an algorithm based on Fuzzy C-Means Clustering with application of similarity threshold in clusters building. Text data were transformed using Vector Space Model and TF-IDF weighting, then clustered using Single Pass Fuzzy Means. Experiments proved that Single Pass Fuzzy Means clusters have similar quality compared with Fuzzy C-Means. Moreover, processs time is shorter than Fuzzy C-Means. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Nowadays, information can be found from various sources, social media for instance. Social media provides massive scale information which can be used for many purposes. The main issue is how to tranform this information into knowledge. In this research, information was processed to detect current topics in social media. Topic detection can be done using several methods, one of them is clustering. This research focus on increasing the efficiency of clustering algorithm for text data. Text data is retrieved from Twitter and processed using clustering algorithm to produce bags of words, where each bag of words is cluster center and represents a particular topic. Proposed algorithm, Single Pass Fuzzy Means, is an algorithm based on Fuzzy C-Means Clustering with application of similarity threshold in clusters building. Text data were transformed using Vector Space Model and TF-IDF weighting, then clustered using Single Pass Fuzzy Means. Experiments proved that Single Pass Fuzzy Means clusters have similar quality compared with Fuzzy C-Means. Moreover, processs time is shorter than Fuzzy C-Means.
format Theses
author Veronica Claudia Muljana, Maria
spellingShingle Veronica Claudia Muljana, Maria
SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
author_facet Veronica Claudia Muljana, Maria
author_sort Veronica Claudia Muljana, Maria
title SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
title_short SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
title_full SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
title_fullStr SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
title_full_unstemmed SINGLE PASS FUZZY MEANS CLUSTERING FOR TOPIC DETECTION
title_sort single pass fuzzy means clustering for topic detection
url https://digilib.itb.ac.id/gdl/view/47981
_version_ 1822271592355332096