PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS
Data Mining is a process for gaining pattern and knowledge from data (Han etc., 2012). This process can help users because it can be used as a consideration to determine the next business steps. However, the prediction results are not 100% reliable. One of the reasons is the possibility of unfair...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/51429 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:51429 |
---|---|
spelling |
id-itb.:514292020-09-28T17:51:34ZPREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS Zabrina Pramata, Nella Indonesia Final Project Fairness AI preprocessing techniques, discrimination, Binary Label Dataset INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/51429 Data Mining is a process for gaining pattern and knowledge from data (Han etc., 2012). This process can help users because it can be used as a consideration to determine the next business steps. However, the prediction results are not 100% reliable. One of the reasons is the possibility of unfairness in the results of the prediction made by the model. The unfairness in the results could occur because the training data that is used for training process contains sensitive information. The pattern obtained from the training process is influenced by sensitive information which could potentially cause discrimination against that sensitive information. That way, this result is likely to harm the specific groups of people due to discrimination. The Artificial Intelligence system that could potentially discriminate against sensitive information is called Unfair Artificial Intelligence which will henceforth be abbreviated as Unfair AI. Several ways have been proposed by various researchers to handle unfair AI in the form of techniques in the preprocessing, inprocessing, and postprocessing stages. In this research, the Fairness AI techniques to handle discrimination is focused on the preprocessing techniques so that sensitive attributes in the dataset can be handled before the training step. As for the dataset used, only focuses on Binary Label Dataset. The Fairness AI preprocessing techniques used in this research consisted of Uniform Sampling, Preferential Sampling, Preferential Sampling, Massaging the Dataset, Reweighing, Suppression, and four modified techniques of Suppression. Based on the results of this study, it is known that Uniform Sampling, Massaging the Dataset, and Reweighing techniques tend to reduce the level of discrimination. However, the other six Fairness AI preprocessing techniques can still be used to reduce the level of fairness even though the results are not always effective. Based on these results, there is no most suitable technique for all datasets so that these nine techniques still need to be used to find out which technique is suitable for the dataset to be used. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Data Mining is a process for gaining pattern and knowledge from data (Han
etc., 2012). This process can help users because it can be used as a consideration to
determine the next business steps. However, the prediction results are not 100%
reliable. One of the reasons is the possibility of unfairness in the results of the
prediction made by the model. The unfairness in the results could occur because the
training data that is used for training process contains sensitive information. The
pattern obtained from the training process is influenced by sensitive information
which could potentially cause discrimination against that sensitive information.
That way, this result is likely to harm the specific groups of people due to
discrimination. The Artificial Intelligence system that could potentially
discriminate against sensitive information is called Unfair Artificial Intelligence
which will henceforth be abbreviated as Unfair AI.
Several ways have been proposed by various researchers to handle unfair
AI in the form of techniques in the preprocessing, inprocessing, and postprocessing
stages. In this research, the Fairness AI techniques to handle discrimination is
focused on the preprocessing techniques so that sensitive attributes in the dataset
can be handled before the training step. As for the dataset used, only focuses on
Binary Label Dataset.
The Fairness AI preprocessing techniques used in this research consisted of
Uniform Sampling, Preferential Sampling, Preferential Sampling, Massaging the
Dataset, Reweighing, Suppression, and four modified techniques of Suppression.
Based on the results of this study, it is known that Uniform Sampling, Massaging
the Dataset, and Reweighing techniques tend to reduce the level of discrimination.
However, the other six Fairness AI preprocessing techniques can still be used to
reduce the level of fairness even though the results are not always effective. Based
on these results, there is no most suitable technique for all datasets so that these
nine techniques still need to be used to find out which technique is suitable for the
dataset to be used.
|
format |
Final Project |
author |
Zabrina Pramata, Nella |
spellingShingle |
Zabrina Pramata, Nella PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
author_facet |
Zabrina Pramata, Nella |
author_sort |
Zabrina Pramata, Nella |
title |
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
title_short |
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
title_full |
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
title_fullStr |
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
title_full_unstemmed |
PREPROCESSING TECHNIQUES FOR HANDLING DISCRIMINATION IN BINARY LABEL DATASETS |
title_sort |
preprocessing techniques for handling discrimination in binary label datasets |
url |
https://digilib.itb.ac.id/gdl/view/51429 |
_version_ |
1822928737518223360 |