MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/84179 http://hdl.handle.net/10220/49783 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-84179 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-841792020-03-07T11:50:48Z MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture Li, Chuang Li, Kenli Li, Keqin Lin, Feng School of Computer Science and Engineering Engineering::Computer science and engineering Peptide Identification Tandem Mass Spectrometry Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.Results:This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.Conclusions:For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem. Published version 2019-08-27T02:17:18Z 2019-12-06T15:39:58Z 2019-08-27T02:17:18Z 2019-12-06T15:39:58Z 2019 Journal Article Li, C., Li, K., Li, K., & Lin, F. (2019). MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture. BMC Bioinformatics, 20(1), 397-. doi:10.1186/s12859-019-2980-5 https://hdl.handle.net/10356/84179 http://hdl.handle.net/10220/49783 10.1186/s12859-019-2980-5 en BMC Bioinformatics © 2019 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. 13 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Peptide Identification Tandem Mass Spectrometry |
spellingShingle |
Engineering::Computer science and engineering Peptide Identification Tandem Mass Spectrometry Li, Chuang Li, Kenli Li, Keqin Lin, Feng MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
description |
Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.Results:This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.Conclusions:For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Li, Chuang Li, Kenli Li, Keqin Lin, Feng |
format |
Article |
author |
Li, Chuang Li, Kenli Li, Keqin Lin, Feng |
author_sort |
Li, Chuang |
title |
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
title_short |
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
title_full |
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
title_fullStr |
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
title_full_unstemmed |
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture |
title_sort |
mctandem : an efficient tool for large-scale peptide identification on many integrated core (mic) architecture |
publishDate |
2019 |
url |
https://hdl.handle.net/10356/84179 http://hdl.handle.net/10220/49783 |
_version_ |
1681037033833758720 |