IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE

Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data qu...

Full description

Saved in:
Bibliographic Details
Main Author: FEBRIYAN (NIM: 13511067), RAMA
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/23824
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:23824
spelling id-itb.:238242017-10-09T10:28:07ZIMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE FEBRIYAN (NIM: 13511067), RAMA Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/23824 Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data quality. Missing value is one kind of dirty data that have to be handled in an ETL process when building data warehouse in order to get results with good quality. <br /> <br /> <br /> Gibbs sampling is a one of deterministic algorithm that can be used to determine missing value when data extraction takes place. Gibbs sampling requires a condition when all conditional distribution of each variable are met. As long as the condition is successfully met, the missing value can be filled based on other known values. <br /> <br /> <br /> The gibbs sampling test to fill the missing value value was done on two sample data. These two data are conditioned in such a way to represent data that has missing value. From the test results, it is known that gibbs sampling method is able to fill out any missing value and get completeness and validity to 100%. Data accuracy from the test results are affected by the amount of value that one attribute has and random number which is generated along the process. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data quality. Missing value is one kind of dirty data that have to be handled in an ETL process when building data warehouse in order to get results with good quality. <br /> <br /> <br /> Gibbs sampling is a one of deterministic algorithm that can be used to determine missing value when data extraction takes place. Gibbs sampling requires a condition when all conditional distribution of each variable are met. As long as the condition is successfully met, the missing value can be filled based on other known values. <br /> <br /> <br /> The gibbs sampling test to fill the missing value value was done on two sample data. These two data are conditioned in such a way to represent data that has missing value. From the test results, it is known that gibbs sampling method is able to fill out any missing value and get completeness and validity to 100%. Data accuracy from the test results are affected by the amount of value that one attribute has and random number which is generated along the process.
format Final Project
author FEBRIYAN (NIM: 13511067), RAMA
spellingShingle FEBRIYAN (NIM: 13511067), RAMA
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
author_facet FEBRIYAN (NIM: 13511067), RAMA
author_sort FEBRIYAN (NIM: 13511067), RAMA
title IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
title_short IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
title_full IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
title_fullStr IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
title_full_unstemmed IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
title_sort implementation of gibbs sampling to handle missing value in data cleaning process when building a data warehouse
url https://digilib.itb.ac.id/gdl/view/23824
_version_ 1822020213503164416