THE DATA STREAM CLUSTERING APPLICATION FRAMEWORK FOR TEXT ANALYSIS

Social media is often used nowadays. Therefore, there is a great potential in the data generated by users. One of its uses is by grouping data containing uniform information. To process it, data stream techniques can be used to process data into small pieces, but immediately react to changes in d...

Full description

Saved in:
Bibliographic Details
Main Author: Daffa Dinaya, Muhammad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72011
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Social media is often used nowadays. Therefore, there is a great potential in the data generated by users. One of its uses is by grouping data containing uniform information. To process it, data stream techniques can be used to process data into small pieces, but immediately react to changes in data. In previous research, a data stream processing engine named Apache Flink was used which was quite difficult because it required sending program compilations to a distributed system and using the Java language. Thus, it would be difficult if you want to process text data, which is more developed in the Python language. From this problem, a framework was developed to facilitate the development of clustering applications for data streams on Apache Flink Statefun and FastAPI. This framework works by reading user configurations and then running the process formed to process the data stream. The framework also provides process customization if users want to implement it independently and use it as a process. The framework can speed up development by efficiently generating source code that needs to be generated by users. Acceleration assistance is available if users involve processes, both those provided by the application framework and customized processes.