Performance analysis of data replication and scheduling in data grid

The Grid is an infrastructure that enables dynamic sharing and coordinated access of resources among different organizations. As a specialization and extension of the Grid, Data Grid emphasizes on the sharing of large-scale data sets and data storage resources. It has evolved to be the solution for...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Junwei
Other Authors: Lee Bu Sung, Francis
Format: Theses and Dissertations
Language:English
Published: 2010
Subjects:
Online Access:https://hdl.handle.net/10356/38584
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-38584
record_format dspace
spelling sg-ntu-dr.10356-385842023-03-04T00:40:24Z Performance analysis of data replication and scheduling in data grid Zhang, Junwei Lee Bu Sung, Francis School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer-communication networks The Grid is an infrastructure that enables dynamic sharing and coordinated access of resources among different organizations. As a specialization and extension of the Grid, Data Grid emphasizes on the sharing of large-scale data sets and data storage resources. It has evolved to be the solution for data intensive applications, such as global climate change, High Energy Physics (HEP), astrophysics, and computational genomics. In these research domains, the size of scientific data is measured in terabytes (1024 gigabyte) or even petabytes (1024 terabytes). Such scientific data are stored as large files and replicated across the Data Grid. Scientists geographically located all over the world are able to download these datasets and analyze them for various purposes. Hierarchical Data Grid is a class of Data Grid that has been adopted by European Organization for Nuclear Research (CERN) to support the distribution of large experimental datasets across the globe. There have been a lot of research works on replication algorithms for the Hierarchical Data Grid. I have developed a probabilistic model of data replication in a Hierarchical Data Grid environment. The model enables us to evaluate the optimality of the replication algorithm in terms of average response time and average bandwidth cost. The accuracy of the model is verified through simulation. DOCTOR OF PHILOSOPHY (SCE) 2010-05-12T04:35:18Z 2010-05-12T04:35:18Z 2009 2009 Thesis Zhang, J. W. (2009). Performance analysis of data replication and scheduling in data grid. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/38584 10.32657/10356/38584 en 156 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer-communication networks
spellingShingle DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer-communication networks
Zhang, Junwei
Performance analysis of data replication and scheduling in data grid
description The Grid is an infrastructure that enables dynamic sharing and coordinated access of resources among different organizations. As a specialization and extension of the Grid, Data Grid emphasizes on the sharing of large-scale data sets and data storage resources. It has evolved to be the solution for data intensive applications, such as global climate change, High Energy Physics (HEP), astrophysics, and computational genomics. In these research domains, the size of scientific data is measured in terabytes (1024 gigabyte) or even petabytes (1024 terabytes). Such scientific data are stored as large files and replicated across the Data Grid. Scientists geographically located all over the world are able to download these datasets and analyze them for various purposes. Hierarchical Data Grid is a class of Data Grid that has been adopted by European Organization for Nuclear Research (CERN) to support the distribution of large experimental datasets across the globe. There have been a lot of research works on replication algorithms for the Hierarchical Data Grid. I have developed a probabilistic model of data replication in a Hierarchical Data Grid environment. The model enables us to evaluate the optimality of the replication algorithm in terms of average response time and average bandwidth cost. The accuracy of the model is verified through simulation.
author2 Lee Bu Sung, Francis
author_facet Lee Bu Sung, Francis
Zhang, Junwei
format Theses and Dissertations
author Zhang, Junwei
author_sort Zhang, Junwei
title Performance analysis of data replication and scheduling in data grid
title_short Performance analysis of data replication and scheduling in data grid
title_full Performance analysis of data replication and scheduling in data grid
title_fullStr Performance analysis of data replication and scheduling in data grid
title_full_unstemmed Performance analysis of data replication and scheduling in data grid
title_sort performance analysis of data replication and scheduling in data grid
publishDate 2010
url https://hdl.handle.net/10356/38584
_version_ 1759854082326003712