Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification

Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali R.R., Al-Dayyeni W.S., Gunasekaran S.S., Mostafa S.A., Abdulkader A.H., Rachmawanto E.H.
Other Authors: 57200536163
Format: Conference Paper
Published: Springer Science and Business Media Deutschland GmbH 2023
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Tenaga Nasional
id my.uniten.dspace-27232
record_format dspace
spelling my.uniten.dspace-272322023-05-29T17:41:19Z Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification Ali R.R. Al-Dayyeni W.S. Gunasekaran S.S. Mostafa S.A. Abdulkader A.H. Rachmawanto E.H. 57200536163 57225961808 55652730500 37036085800 57545111700 57193850466 Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used. The Rate of Change (RoC) is used for tracking relevant bytes in the appropriate groups of their orders. Entropy and Byte Frequency Distribution (BFD) are used to produce an image cluster histogram based on the size of the byte value. Subsequently, we deploy the Extreme Learning Machine (ELM) classifier to evaluate these three features. The ELM identifies the type based on the generated feature vector, whether a JPEG file or a non-JPEG file type. The proposed method is implemented in MATLAB 2017a software and tested and evaluated by using the DFRWS dataset. The test results show that the ELM produces high classification accuracy in identifying the file type. The difference in accuracy between the combinations of the tested features is relatively small. The worst accuracy is generated when the entropy method is used, which is 72.62%, and the best accuracy of 93.46% is generated when using a combination of the three features. � 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. Final 2023-05-29T09:41:19Z 2023-05-29T09:41:19Z 2022 Conference Paper 10.1007/978-3-030-98015-3_21 2-s2.0-85126979417 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85126979417&doi=10.1007%2f978-3-030-98015-3_21&partnerID=40&md5=c1f85c0a4dcac107ced9c9d91984ebb4 https://irepository.uniten.edu.my/handle/123456789/27232 439 LNNS 314 325 Springer Science and Business Media Deutschland GmbH Scopus
institution Universiti Tenaga Nasional
building UNITEN Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tenaga Nasional
content_source UNITEN Institutional Repository
url_provider http://dspace.uniten.edu.my/
description Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used. The Rate of Change (RoC) is used for tracking relevant bytes in the appropriate groups of their orders. Entropy and Byte Frequency Distribution (BFD) are used to produce an image cluster histogram based on the size of the byte value. Subsequently, we deploy the Extreme Learning Machine (ELM) classifier to evaluate these three features. The ELM identifies the type based on the generated feature vector, whether a JPEG file or a non-JPEG file type. The proposed method is implemented in MATLAB 2017a software and tested and evaluated by using the DFRWS dataset. The test results show that the ELM produces high classification accuracy in identifying the file type. The difference in accuracy between the combinations of the tested features is relatively small. The worst accuracy is generated when the entropy method is used, which is 72.62%, and the best accuracy of 93.46% is generated when using a combination of the three features. � 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
author2 57200536163
author_facet 57200536163
Ali R.R.
Al-Dayyeni W.S.
Gunasekaran S.S.
Mostafa S.A.
Abdulkader A.H.
Rachmawanto E.H.
format Conference Paper
author Ali R.R.
Al-Dayyeni W.S.
Gunasekaran S.S.
Mostafa S.A.
Abdulkader A.H.
Rachmawanto E.H.
spellingShingle Ali R.R.
Al-Dayyeni W.S.
Gunasekaran S.S.
Mostafa S.A.
Abdulkader A.H.
Rachmawanto E.H.
Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
author_sort Ali R.R.
title Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
title_short Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
title_full Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
title_fullStr Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
title_full_unstemmed Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification
title_sort content-based feature extraction and extreme learning machine for optimizing file cluster types identification
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2023
_version_ 1806424276544258048