THE DATA STREAM CLUSTERING APPLICATION FRAMEWORK FOR TEXT ANALYSIS
Social media is often used nowadays. Therefore, there is a great potential in the data generated by users. One of its uses is by grouping data containing uniform information. To process it, data stream techniques can be used to process data into small pieces, but immediately react to changes in d...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/72011 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Social media is often used nowadays. Therefore, there is a great potential in the
data generated by users. One of its uses is by grouping data containing uniform
information. To process it, data stream techniques can be used to process data into
small pieces, but immediately react to changes in data. In previous research, a data
stream processing engine named Apache Flink was used which was quite difficult
because it required sending program compilations to a distributed system and using
the Java language. Thus, it would be difficult if you want to process text data, which
is more developed in the Python language.
From this problem, a framework was developed to facilitate the development of
clustering applications for data streams on Apache Flink Statefun and FastAPI.
This framework works by reading user configurations and then running the process
formed to process the data stream. The framework also provides process
customization if users want to implement it independently and use it as a process.
The framework can speed up development by efficiently generating source code
that needs to be generated by users. Acceleration assistance is available if users
involve processes, both those provided by the application framework and
customized processes. |
---|