Improved random forest for feature selection in writer identification

Writer Identification (WI) is a process to determine the writer of a given handwriting sample. A handwriting sample consists of various types of features. These features are unique due to the writer’s characteristics and individuality, which challenges the identification process. Some features do no...

Full description

Saved in:

Bibliographic Details
Main Author:	Sukor, Nooraziera Akmal
Format:	Thesis
Language:	English English
Published:	2015
Subjects:	T Technology (General) TA Engineering (General). Civil engineering (General)
Online Access:	http://eprints.utem.edu.my/id/eprint/16842/1/Improved%20Random%20Forest%20For%20Feature%20Selection%20In%20Writer%20Identification.pdf http://eprints.utem.edu.my/id/eprint/16842/2/Improved%20random%20forest%20for%20feature%20selection%20in%20writer%20identification.pdf http://eprints.utem.edu.my/id/eprint/16842/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96166
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknikal Malaysia Melaka
Language:	English English

id	my.utem.eprints.16842
record_format	eprints
spelling	my.utem.eprints.168422022-06-07T13:30:20Z http://eprints.utem.edu.my/id/eprint/16842/ Improved random forest for feature selection in writer identification Sukor, Nooraziera Akmal T Technology (General) TA Engineering (General). Civil engineering (General) Writer Identification (WI) is a process to determine the writer of a given handwriting sample. A handwriting sample consists of various types of features. These features are unique due to the writer’s characteristics and individuality, which challenges the identification process. Some features do not provide useful information and may cause to decrease the performance of a classifier. Thus, feature selection process is implemented in WI process. Feature selection is a process to identify and select the most significant features from presented features in handwriting documents and to eliminate the irrelevant features. Due to the WI framework, discretization process is applied before the feature selection process. Discretization process was proven to increase the classification performances and improved the identification performance in WI. An algorithm and framework of Improved Random Forest (IRF) tree was applied for feature selection process. RF tree is a collection of tree predictors used to ensemble decision tree models with a randomized selection of features at each split. It involved Classification and Regression Tree (CART) during the development of tree. Important features are measured by using Variable Importance (VI). While Mean Absolute Error (MAE) values use to identify the variance between writers, VI value was used for splitting process in tree and MAE value is to ensure the intra-class (same writer) invariance is lower than inter-class (different writer) invariance because lower intra-class invariance indicates accuracy to the real author. Number of selected features and the classification accuracy is used to indicate the performances of feature selection method. Experimental results have shown that the performances of IRF tree in discretized dataset produced third feature (f3) as the most important feature with average classification accuracy 99.19%. For un- discretized dataset, first feature (f1) and third feature (f3) are the most important features with average classification accuracy 40.79%. 2015 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/16842/1/Improved%20Random%20Forest%20For%20Feature%20Selection%20In%20Writer%20Identification.pdf text en http://eprints.utem.edu.my/id/eprint/16842/2/Improved%20random%20forest%20for%20feature%20selection%20in%20writer%20identification.pdf Sukor, Nooraziera Akmal (2015) Improved random forest for feature selection in writer identification. Masters thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96166
institution	Universiti Teknikal Malaysia Melaka
building	UTEM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknikal Malaysia Melaka
content_source	UTEM Institutional Repository
url_provider	http://eprints.utem.edu.my/
language	English English
topic	T Technology (General) TA Engineering (General). Civil engineering (General)
spellingShingle	T Technology (General) TA Engineering (General). Civil engineering (General) Sukor, Nooraziera Akmal Improved random forest for feature selection in writer identification
description	Writer Identification (WI) is a process to determine the writer of a given handwriting sample. A handwriting sample consists of various types of features. These features are unique due to the writer’s characteristics and individuality, which challenges the identification process. Some features do not provide useful information and may cause to decrease the performance of a classifier. Thus, feature selection process is implemented in WI process. Feature selection is a process to identify and select the most significant features from presented features in handwriting documents and to eliminate the irrelevant features. Due to the WI framework, discretization process is applied before the feature selection process. Discretization process was proven to increase the classification performances and improved the identification performance in WI. An algorithm and framework of Improved Random Forest (IRF) tree was applied for feature selection process. RF tree is a collection of tree predictors used to ensemble decision tree models with a randomized selection of features at each split. It involved Classification and Regression Tree (CART) during the development of tree. Important features are measured by using Variable Importance (VI). While Mean Absolute Error (MAE) values use to identify the variance between writers, VI value was used for splitting process in tree and MAE value is to ensure the intra-class (same writer) invariance is lower than inter-class (different writer) invariance because lower intra-class invariance indicates accuracy to the real author. Number of selected features and the classification accuracy is used to indicate the performances of feature selection method. Experimental results have shown that the performances of IRF tree in discretized dataset produced third feature (f3) as the most important feature with average classification accuracy 99.19%. For un- discretized dataset, first feature (f1) and third feature (f3) are the most important features with average classification accuracy 40.79%.
format	Thesis
author	Sukor, Nooraziera Akmal
author_facet	Sukor, Nooraziera Akmal
author_sort	Sukor, Nooraziera Akmal
title	Improved random forest for feature selection in writer identification
title_short	Improved random forest for feature selection in writer identification
title_full	Improved random forest for feature selection in writer identification
title_fullStr	Improved random forest for feature selection in writer identification
title_full_unstemmed	Improved random forest for feature selection in writer identification
title_sort	improved random forest for feature selection in writer identification
publishDate	2015
url	http://eprints.utem.edu.my/id/eprint/16842/1/Improved%20Random%20Forest%20For%20Feature%20Selection%20In%20Writer%20Identification.pdf http://eprints.utem.edu.my/id/eprint/16842/2/Improved%20random%20forest%20for%20feature%20selection%20in%20writer%20identification.pdf http://eprints.utem.edu.my/id/eprint/16842/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96166
_version_	1735390155451138048

Improved random forest for feature selection in writer identification

Similar Items