METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION
File-type identification is complex problem because of different data type and file type. Some common software used to identify file types fail to recognize file types when the file is damaged or modified because it works based on extension, signature file, and database software. Several studies on...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/38786 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:38786 |
---|---|
spelling |
id-itb.:387862019-06-17T15:13:12ZMETHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION Supriyatno, Gigih Indonesia Theses n-gram, number summarization, letter summarization, non-summarization, file-type identification. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/38786 File-type identification is complex problem because of different data type and file type. Some common software used to identify file types fail to recognize file types when the file is damaged or modified because it works based on extension, signature file, and database software. Several studies on the file-type identification have been carried out using different approach methods, one of them using n-gram analysis. Some studies that using n-gram for file classification generally only uses n-gram with short size (1-gram to 2-gram). In 2011, Mayer developed the summarized n-gram concept to utilize n-gram with length n> 2. His method eliminates short n-gram and utilizes long n-gram as a predictor. In 2013, Burman improved the Mayer method by involving a short n-gram in his algorithm. Unfortunately the two researchers used different learning files to make predictor models for file classification. Differences in methods and learning files affect the performance resulted. This study developed both methods from Mayer and Burman's research by analysing summarization methods and systematically selecting learning files. The results of this study indicate that by using the learning file selection in stages and appropriate n-gram extraction method produces better performance than the Burman experiment. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
File-type identification is complex problem because of different data type and file type. Some common software used to identify file types fail to recognize file types when the file is damaged or modified because it works based on extension, signature file, and database software. Several studies on the file-type identification have been carried out using different approach methods, one of them using n-gram analysis. Some studies that using n-gram for file classification generally only uses n-gram with short size (1-gram to 2-gram). In 2011, Mayer developed the summarized n-gram concept to utilize n-gram with length n> 2. His method eliminates short n-gram and utilizes long n-gram as a predictor. In 2013, Burman improved the Mayer method by involving a short n-gram in his algorithm. Unfortunately the two researchers used different learning files to make predictor models for file classification. Differences in methods and learning files affect the performance resulted. This study developed both methods from Mayer and Burman's research by analysing summarization methods and systematically selecting learning files. The results of this study indicate that by using the learning file selection in stages and appropriate n-gram extraction method produces better performance than the Burman experiment. |
format |
Theses |
author |
Supriyatno, Gigih |
spellingShingle |
Supriyatno, Gigih METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
author_facet |
Supriyatno, Gigih |
author_sort |
Supriyatno, Gigih |
title |
METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
title_short |
METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
title_full |
METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
title_fullStr |
METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
title_full_unstemmed |
METHOD OF FILE-TYPE IDENTIFICATION WITH SUMMARIZED N-GRAM USING STAGE SAMPLE SELECTION |
title_sort |
method of file-type identification with summarized n-gram using stage sample selection |
url |
https://digilib.itb.ac.id/gdl/view/38786 |
_version_ |
1823638317401374720 |