PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS

Data Mining is a process for gaining pattern and knowledge from data (Han etc., 2012). This process can help users because it can be used as a consideration to determine the next business steps. However, the prediction results are not 100% reliable. One of the reasons is the possibility of unfair...

Full description

Saved in:
Bibliographic Details
Main Author: Zabrina Pramata, Nella
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/51429
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:51429
spelling id-itb.:514292020-09-28T17:51:34ZPREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS Zabrina Pramata, Nella Indonesia Final Project Fairness AI preprocessing techniques, discrimination, Binary Label Dataset INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/51429 Data Mining is a process for gaining pattern and knowledge from data (Han etc., 2012). This process can help users because it can be used as a consideration to determine the next business steps. However, the prediction results are not 100% reliable. One of the reasons is the possibility of unfairness in the results of the prediction made by the model. The unfairness in the results could occur because the training data that is used for training process contains sensitive information. The pattern obtained from the training process is influenced by sensitive information which could potentially cause discrimination against that sensitive information. That way, this result is likely to harm the specific groups of people due to discrimination. The Artificial Intelligence system that could potentially discriminate against sensitive information is called Unfair Artificial Intelligence which will henceforth be abbreviated as Unfair AI. Several ways have been proposed by various researchers to handle unfair AI in the form of techniques in the preprocessing, inprocessing, and postprocessing stages. In this research, the Fairness AI techniques to handle discrimination is focused on the preprocessing techniques so that sensitive attributes in the dataset can be handled before the training step. As for the dataset used, only focuses on Binary Label Dataset. The Fairness AI preprocessing techniques used in this research consisted of Uniform Sampling, Preferential Sampling, Preferential Sampling, Massaging the Dataset, Reweighing, Suppression, and four modified techniques of Suppression. Based on the results of this study, it is known that Uniform Sampling, Massaging the Dataset, and Reweighing techniques tend to reduce the level of discrimination. However, the other six Fairness AI preprocessing techniques can still be used to reduce the level of fairness even though the results are not always effective. Based on these results, there is no most suitable technique for all datasets so that these nine techniques still need to be used to find out which technique is suitable for the dataset to be used. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Data Mining is a process for gaining pattern and knowledge from data (Han etc., 2012). This process can help users because it can be used as a consideration to determine the next business steps. However, the prediction results are not 100% reliable. One of the reasons is the possibility of unfairness in the results of the prediction made by the model. The unfairness in the results could occur because the training data that is used for training process contains sensitive information. The pattern obtained from the training process is influenced by sensitive information which could potentially cause discrimination against that sensitive information. That way, this result is likely to harm the specific groups of people due to discrimination. The Artificial Intelligence system that could potentially discriminate against sensitive information is called Unfair Artificial Intelligence which will henceforth be abbreviated as Unfair AI. Several ways have been proposed by various researchers to handle unfair AI in the form of techniques in the preprocessing, inprocessing, and postprocessing stages. In this research, the Fairness AI techniques to handle discrimination is focused on the preprocessing techniques so that sensitive attributes in the dataset can be handled before the training step. As for the dataset used, only focuses on Binary Label Dataset. The Fairness AI preprocessing techniques used in this research consisted of Uniform Sampling, Preferential Sampling, Preferential Sampling, Massaging the Dataset, Reweighing, Suppression, and four modified techniques of Suppression. Based on the results of this study, it is known that Uniform Sampling, Massaging the Dataset, and Reweighing techniques tend to reduce the level of discrimination. However, the other six Fairness AI preprocessing techniques can still be used to reduce the level of fairness even though the results are not always effective. Based on these results, there is no most suitable technique for all datasets so that these nine techniques still need to be used to find out which technique is suitable for the dataset to be used.
format Final Project
author Zabrina Pramata, Nella
spellingShingle Zabrina Pramata, Nella
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
author_facet Zabrina Pramata, Nella
author_sort Zabrina Pramata, Nella
title PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
title_short PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
title_full PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
title_fullStr PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
title_full_unstemmed PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
title_sort preprocessing techniques for handling discrimination in binary label datasets
url https://digilib.itb.ac.id/gdl/view/51429
_version_ 1822928737518223360