The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor |
| |
Authors: | Michael Gschwind |
| |
Affiliation: | (1) IBM T.J. Watson Research Center, Yorktown Heights, NY, USA |
| |
Abstract: | As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g.,
deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which
system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance,
and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become
popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe
the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level,
thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system,
this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe
how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application
code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example
of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.
This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006. |
| |
Keywords: | Chip multiprocessor heterogeneous chip multiprocessor compute-transfer parallelism multi-level application parallelism Cell Broadband Engine |
本文献已被 SpringerLink 等数据库收录! |
|