Collecting and analyzing I/O patterns for data intensive applications

As the reliance on computer systems increases, so does complexity of the system and the data size. In order to maintain the efficiency of systems and enhance its scalability, different optimization techniques can be employed. This project looks into the locality of reference of applications, in hop...

Full description

Saved in:

Bibliographic Details
Main Author:	Goh, Ming Rui.
Other Authors:	School of Computer Engineering
Format:	Final Year Project
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
Online Access:	http://hdl.handle.net/10356/48601
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-48601
record_format	dspace
spelling	sg-ntu-dr.10356-486012023-03-03T20:33:12Z Collecting and analyzing I/O patterns for data intensive applications Goh, Ming Rui. School of Computer Engineering He Bingsheng DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems As the reliance on computer systems increases, so does complexity of the system and the data size. In order to maintain the efficiency of systems and enhance its scalability, different optimization techniques can be employed. This project looks into the locality of reference of applications, in hope to optimize the performance by administering data within faster speed memory like caches. This project looks into the use of Linux blktrace and blkparse utility, which captures the block input/output traces from different software applications. The analysis is performed on the Hadoop Framework which establishes connection between computer systems to execute tasks in parallel. Preliminary of the analysis dealt with familiarization of the blktrace and blkparse utility. Since the blktrace utility captures all the traces of block input/output that occurred in the system in a specific period, it is essential to filter only those traces relevant to the analysis. In the process of analyzing the data, several different approaches were taken to retrieve and represent the result with increasing accuracy. Due to the inconsistency between a file size and the block input/output read, different file systems were also analyzed to verify this observation. The result show that the current method of filtering the block input/output traces from a specific program included overheads that made the size of the trace larger than the original file size. Analysis on the wordcount function of Hadoop shows that the file access contains the characteristic of spatial locality. Most of each subsequent block access is found to be relatively fast; in the range of 1-4 milliseconds. The analysis on the Database Test Suite– 2 shows that MySQL has a random access behavior on its block I/O accesses. Bachelor of Engineering (Computer Science) 2012-04-27T03:26:42Z 2012-04-27T03:26:42Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/48601 en Nanyang Technological University 91 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems Goh, Ming Rui. Collecting and analyzing I/O patterns for data intensive applications
description	As the reliance on computer systems increases, so does complexity of the system and the data size. In order to maintain the efficiency of systems and enhance its scalability, different optimization techniques can be employed. This project looks into the locality of reference of applications, in hope to optimize the performance by administering data within faster speed memory like caches. This project looks into the use of Linux blktrace and blkparse utility, which captures the block input/output traces from different software applications. The analysis is performed on the Hadoop Framework which establishes connection between computer systems to execute tasks in parallel. Preliminary of the analysis dealt with familiarization of the blktrace and blkparse utility. Since the blktrace utility captures all the traces of block input/output that occurred in the system in a specific period, it is essential to filter only those traces relevant to the analysis. In the process of analyzing the data, several different approaches were taken to retrieve and represent the result with increasing accuracy. Due to the inconsistency between a file size and the block input/output read, different file systems were also analyzed to verify this observation. The result show that the current method of filtering the block input/output traces from a specific program included overheads that made the size of the trace larger than the original file size. Analysis on the wordcount function of Hadoop shows that the file access contains the characteristic of spatial locality. Most of each subsequent block access is found to be relatively fast; in the range of 1-4 milliseconds. The analysis on the Database Test Suite– 2 shows that MySQL has a random access behavior on its block I/O accesses.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Goh, Ming Rui.
format	Final Year Project
author	Goh, Ming Rui.
author_sort	Goh, Ming Rui.
title	Collecting and analyzing I/O patterns for data intensive applications
title_short	Collecting and analyzing I/O patterns for data intensive applications
title_full	Collecting and analyzing I/O patterns for data intensive applications
title_fullStr	Collecting and analyzing I/O patterns for data intensive applications
title_full_unstemmed	Collecting and analyzing I/O patterns for data intensive applications
title_sort	collecting and analyzing i/o patterns for data intensive applications
publishDate	2012
url	http://hdl.handle.net/10356/48601
_version_	1759857771569741824

Collecting and analyzing I/O patterns for data intensive applications

Similar Items