PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION

While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry,...

Full description

Saved in:
Bibliographic Details
Main Author: Rizal Alifio, Ahmad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/58043
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:58043
spelling id-itb.:580432021-08-30T12:17:26ZPERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION Rizal Alifio, Ahmad Indonesia Final Project SMOTE, parallel, classification, oversampling, imbalanced data INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/58043 While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry, airlines that can anticipate delayed flights are able to take quick action to address these conditions before a flight delay results in another delay. Even so, the data for late flights is not as readily available as flights that arrive on time. This problem can be overcome by applying an oversampling method. Furthermore, the issue of execution time is also important when working with big data. Therefore, a parallel implementation was chosen in the hope of giving better results, without drastically increasing the running time of the algorithm. In this final project, research has been carried out on the synthetic minority oversampling technique (SMOTE) method as a form of oversampling that is often used. Parallel implementation is done for two partition mechanisms: random and locality sensitive hashing. Evaluation is then carried out by comparing performance metrics on the random forest and K-Nearest Neighbor classification model. After the experiment, it was found that the distribution of datasets using locality sensitive hashing can reduce the trend of increasing execution time in relation to increasing the amount of data, although the evaluation results show a lower recall value than the serial SMOTE method. On the other hand, the random distribution of datasets shows unsatisfactory results, both in terms of duration and accuracy text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry, airlines that can anticipate delayed flights are able to take quick action to address these conditions before a flight delay results in another delay. Even so, the data for late flights is not as readily available as flights that arrive on time. This problem can be overcome by applying an oversampling method. Furthermore, the issue of execution time is also important when working with big data. Therefore, a parallel implementation was chosen in the hope of giving better results, without drastically increasing the running time of the algorithm. In this final project, research has been carried out on the synthetic minority oversampling technique (SMOTE) method as a form of oversampling that is often used. Parallel implementation is done for two partition mechanisms: random and locality sensitive hashing. Evaluation is then carried out by comparing performance metrics on the random forest and K-Nearest Neighbor classification model. After the experiment, it was found that the distribution of datasets using locality sensitive hashing can reduce the trend of increasing execution time in relation to increasing the amount of data, although the evaluation results show a lower recall value than the serial SMOTE method. On the other hand, the random distribution of datasets shows unsatisfactory results, both in terms of duration and accuracy
format Final Project
author Rizal Alifio, Ahmad
spellingShingle Rizal Alifio, Ahmad
PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
author_facet Rizal Alifio, Ahmad
author_sort Rizal Alifio, Ahmad
title PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_short PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_full PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_fullStr PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_full_unstemmed PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_sort performance comparison of serial, random parallel, and lsh parallel synthetic minority oversampling technique (smote) in flight delay prediction
url https://digilib.itb.ac.id/gdl/view/58043
_version_ 1822930650394525696