PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION

While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry,...

Full description

Saved in:

Bibliographic Details
Main Author:	Rizal Alifio, Ahmad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/58043
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:58043
spelling	id-itb.:580432021-08-30T12:17:26ZPERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION Rizal Alifio, Ahmad Indonesia Final Project SMOTE, parallel, classification, oversampling, imbalanced data INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/58043 While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry, airlines that can anticipate delayed flights are able to take quick action to address these conditions before a flight delay results in another delay. Even so, the data for late flights is not as readily available as flights that arrive on time. This problem can be overcome by applying an oversampling method. Furthermore, the issue of execution time is also important when working with big data. Therefore, a parallel implementation was chosen in the hope of giving better results, without drastically increasing the running time of the algorithm. In this final project, research has been carried out on the synthetic minority oversampling technique (SMOTE) method as a form of oversampling that is often used. Parallel implementation is done for two partition mechanisms: random and locality sensitive hashing. Evaluation is then carried out by comparing performance metrics on the random forest and K-Nearest Neighbor classification model. After the experiment, it was found that the distribution of datasets using locality sensitive hashing can reduce the trend of increasing execution time in relation to increasing the amount of data, although the evaluation results show a lower recall value than the serial SMOTE method. On the other hand, the random distribution of datasets shows unsatisfactory results, both in terms of duration and accuracy text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	While working with big data, it is sometimes difficult to ensure that all the data is immediately usable. The challenge of dealing with unbalanced data is sometimes one of the obstacles that need to be resolved in order to obtain more representative data. In the context of the aviation industry, airlines that can anticipate delayed flights are able to take quick action to address these conditions before a flight delay results in another delay. Even so, the data for late flights is not as readily available as flights that arrive on time. This problem can be overcome by applying an oversampling method. Furthermore, the issue of execution time is also important when working with big data. Therefore, a parallel implementation was chosen in the hope of giving better results, without drastically increasing the running time of the algorithm. In this final project, research has been carried out on the synthetic minority oversampling technique (SMOTE) method as a form of oversampling that is often used. Parallel implementation is done for two partition mechanisms: random and locality sensitive hashing. Evaluation is then carried out by comparing performance metrics on the random forest and K-Nearest Neighbor classification model. After the experiment, it was found that the distribution of datasets using locality sensitive hashing can reduce the trend of increasing execution time in relation to increasing the amount of data, although the evaluation results show a lower recall value than the serial SMOTE method. On the other hand, the random distribution of datasets shows unsatisfactory results, both in terms of duration and accuracy
format	Final Project
author	Rizal Alifio, Ahmad
spellingShingle	Rizal Alifio, Ahmad PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
author_facet	Rizal Alifio, Ahmad
author_sort	Rizal Alifio, Ahmad
title	PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_short	PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_full	PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_fullStr	PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_full_unstemmed	PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION
title_sort	performance comparison of serial, random parallel, and lsh parallel synthetic minority oversampling technique (smote) in flight delay prediction
url	https://digilib.itb.ac.id/gdl/view/58043
_version_	1822930650394525696

PERFORMANCE COMPARISON OF SERIAL, RANDOM PARALLEL, AND LSH PARALLEL SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) IN FLIGHT DELAY PREDICTION

Similar Items