Scalable Parallelization of Specification Mining using Distributed Computing

Mining specifications from logs of execution traces has attracted much research effort in recent years since the mined specifications, such as program invariants, temporal rules, association patterns, or various behavioral models, may be used to improve program documentation, comprehension, and veri...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Shaowei, David LO, JIANG, Lingxiao, Maoz, Shahar, Budi, Aditya
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2831
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3831
record_format dspace
spelling sg-smu-ink.sis_research-38312015-12-21T03:24:05Z Scalable Parallelization of Specification Mining using Distributed Computing WANG, Shaowei David LO, JIANG, Lingxiao Maoz, Shahar Budi, Aditya Mining specifications from logs of execution traces has attracted much research effort in recent years since the mined specifications, such as program invariants, temporal rules, association patterns, or various behavioral models, may be used to improve program documentation, comprehension, and verification. At the same time, a major challenge faced by most specification mining algorithms is related to their scalability, specifically when dealing with many large execution traces.To address this challenge, we present a general, distributed specification mining algorithm that can parallelize and distribute repetitive specification mining tasks across multiple computers to achieve speedup proportional to the number of machines used. This general algorithm is designed on the basis of our observation that most specification mining algorithms are data and memory intensive while computationally repetitive. To validate the general algorithm, we instantiate it with five existing sequential specification mining algorithms (CLIPPER, Daikon, k-tails, LM, and Perracotta) on a particular distributed computing model (MapReduce) and one of its implementations (Hadoop) to create five parallelized specification mining algorithms, and demonstrate the much improved scalability of the algorithms over many large traces ranging from 41 MB to 157 GB collected from seven DaCapo benchmark programs. Our evaluation shows that our parallelized Perracotta running on four machines (using up to eight CPU cores in total) speeds up the original sequential one by 3-18 times The other four sequential algorithms are unable to complete analyzing the large traces, while our parallelized versions can complete the analysis and gain performance improvement by using more machines and cores. We believe that our general, distributed algorithm fits many specification mining algorithms well, and can be instantiated with them to gain more performance improvement and scalability improvement. 2015-09-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/2831 info:doi/10.1016/B978-0-12-411519-4.00021-5 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Dynamic analysis Execution profiles Hadoop MapReduce Parallelization Scalability Specification mining Computer Sciences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Dynamic analysis
Execution profiles
Hadoop
MapReduce
Parallelization
Scalability
Specification mining
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Dynamic analysis
Execution profiles
Hadoop
MapReduce
Parallelization
Scalability
Specification mining
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
WANG, Shaowei
David LO,
JIANG, Lingxiao
Maoz, Shahar
Budi, Aditya
Scalable Parallelization of Specification Mining using Distributed Computing
description Mining specifications from logs of execution traces has attracted much research effort in recent years since the mined specifications, such as program invariants, temporal rules, association patterns, or various behavioral models, may be used to improve program documentation, comprehension, and verification. At the same time, a major challenge faced by most specification mining algorithms is related to their scalability, specifically when dealing with many large execution traces.To address this challenge, we present a general, distributed specification mining algorithm that can parallelize and distribute repetitive specification mining tasks across multiple computers to achieve speedup proportional to the number of machines used. This general algorithm is designed on the basis of our observation that most specification mining algorithms are data and memory intensive while computationally repetitive. To validate the general algorithm, we instantiate it with five existing sequential specification mining algorithms (CLIPPER, Daikon, k-tails, LM, and Perracotta) on a particular distributed computing model (MapReduce) and one of its implementations (Hadoop) to create five parallelized specification mining algorithms, and demonstrate the much improved scalability of the algorithms over many large traces ranging from 41 MB to 157 GB collected from seven DaCapo benchmark programs. Our evaluation shows that our parallelized Perracotta running on four machines (using up to eight CPU cores in total) speeds up the original sequential one by 3-18 times The other four sequential algorithms are unable to complete analyzing the large traces, while our parallelized versions can complete the analysis and gain performance improvement by using more machines and cores. We believe that our general, distributed algorithm fits many specification mining algorithms well, and can be instantiated with them to gain more performance improvement and scalability improvement.
format text
author WANG, Shaowei
David LO,
JIANG, Lingxiao
Maoz, Shahar
Budi, Aditya
author_facet WANG, Shaowei
David LO,
JIANG, Lingxiao
Maoz, Shahar
Budi, Aditya
author_sort WANG, Shaowei
title Scalable Parallelization of Specification Mining using Distributed Computing
title_short Scalable Parallelization of Specification Mining using Distributed Computing
title_full Scalable Parallelization of Specification Mining using Distributed Computing
title_fullStr Scalable Parallelization of Specification Mining using Distributed Computing
title_full_unstemmed Scalable Parallelization of Specification Mining using Distributed Computing
title_sort scalable parallelization of specification mining using distributed computing
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/2831
_version_ 1770572637353803776