IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE
Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data qu...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/23824 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:23824 |
---|---|
spelling |
id-itb.:238242017-10-09T10:28:07ZIMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE FEBRIYAN (NIM: 13511067), RAMA Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/23824 Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data quality. Missing value is one kind of dirty data that have to be handled in an ETL process when building data warehouse in order to get results with good quality. <br /> <br /> <br /> Gibbs sampling is a one of deterministic algorithm that can be used to determine missing value when data extraction takes place. Gibbs sampling requires a condition when all conditional distribution of each variable are met. As long as the condition is successfully met, the missing value can be filled based on other known values. <br /> <br /> <br /> The gibbs sampling test to fill the missing value value was done on two sample data. These two data are conditioned in such a way to represent data that has missing value. From the test results, it is known that gibbs sampling method is able to fill out any missing value and get completeness and validity to 100%. Data accuracy from the test results are affected by the amount of value that one attribute has and random number which is generated along the process. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Data warehouse is a group of database which is used to analyze information by decision maker. Data for data warehouse comes from several data sources that have different structures and formats. This difference in structure and data format is the main focus that must be considered to maintain data quality. Missing value is one kind of dirty data that have to be handled in an ETL process when building data warehouse in order to get results with good quality. <br />
<br />
<br />
Gibbs sampling is a one of deterministic algorithm that can be used to determine missing value when data extraction takes place. Gibbs sampling requires a condition when all conditional distribution of each variable are met. As long as the condition is successfully met, the missing value can be filled based on other known values. <br />
<br />
<br />
The gibbs sampling test to fill the missing value value was done on two sample data. These two data are conditioned in such a way to represent data that has missing value. From the test results, it is known that gibbs sampling method is able to fill out any missing value and get completeness and validity to 100%. Data accuracy from the test results are affected by the amount of value that one attribute has and random number which is generated along the process. |
format |
Final Project |
author |
FEBRIYAN (NIM: 13511067), RAMA |
spellingShingle |
FEBRIYAN (NIM: 13511067), RAMA IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
author_facet |
FEBRIYAN (NIM: 13511067), RAMA |
author_sort |
FEBRIYAN (NIM: 13511067), RAMA |
title |
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
title_short |
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
title_full |
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
title_fullStr |
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
title_full_unstemmed |
IMPLEMENTATION OF GIBBS SAMPLING TO HANDLE MISSING VALUE IN DATA CLEANING PROCESS WHEN BUILDING A DATA WAREHOUSE |
title_sort |
implementation of gibbs sampling to handle missing value in data cleaning process when building a data warehouse |
url |
https://digilib.itb.ac.id/gdl/view/23824 |
_version_ |
1822020213503164416 |