Self-repairing codes : local repairability for cheap and fast maintenance of erasure coded data

Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its maintainability (its ability to repair lost data to ensu...

Full description

Saved in:
Bibliographic Details
Main Authors: Oggier, Frédérique, Datta, Anwitaman
Other Authors: School of Computer Engineering
Format: Article
Language:English
Published: 2014
Subjects:
Online Access:https://hdl.handle.net/10356/103114
http://hdl.handle.net/10220/24319
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its maintainability (its ability to repair lost data to ensure redundancy replenishment over time). Erasure codes provide a storage efficient alternative to replication based redundancy in storage systems, ensuring the same fault tolerance at a lower storage overhead cost. Traditional erasure codes however have the drawback of entailing high communication overhead for maintenance, when encoded fragments are lost due to storage device failures, and need to be replenished in new nodes. We propose a new family of erasure codes called self-repairing codes (SRC) taking into account the peculiarities of distributed storage systems, specifically to improve its maintainability by ‘localizing’ the repairs. SRC have the property that encoded fragments can be repaired directly from other small subsets of (typically 2 or 3) encoded fragments. These code properties allow bandwidth efficient and fast recovery even in the presence of multiple failures, in turn translating into better system robustness. A concrete family of such locally repairable codes, namely, homomorphic SRC are proposed and various aspects and properties of the same are studied in detail and compared—quantitatively or qualitatively (as may be suitable) with respect to other codes including traditional erasure codes as well as some recent representative codes designed specifically for storage applications.