Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81245 http://hdl.handle.net/10220/39197 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-81245 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-812452020-05-28T07:18:29Z Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors Kapre, Nachiket DeHon, Andre School of Computer Engineering 2009 International Conference on Field Programmable Logic and Applications (FPL) Computer Science and Engineering Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models. Accepted version 2015-12-21T07:52:15Z 2019-12-06T14:26:25Z 2015-12-21T07:52:15Z 2019-12-06T14:26:25Z 2009 Conference Paper Kapre, N., & DeHon, A. (2009). Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors. 2009 International Conference on Field Programmable Logic and Applications. https://hdl.handle.net/10356/81245 http://hdl.handle.net/10220/39197 10.1109/FPL.2009.5272548 en © 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FPL.2009.5272548]. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Computer Science and Engineering |
spellingShingle |
Computer Science and Engineering Kapre, Nachiket DeHon, Andre Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
description |
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Kapre, Nachiket DeHon, Andre |
format |
Conference or Workshop Item |
author |
Kapre, Nachiket DeHon, Andre |
author_sort |
Kapre, Nachiket |
title |
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
title_short |
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
title_full |
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
title_fullStr |
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
title_full_unstemmed |
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors |
title_sort |
performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/81245 http://hdl.handle.net/10220/39197 |
_version_ |
1681056934061408256 |