The Cell Broadband Engine (CBE) is a heterogeneous multicore architecture developed by IBM, Sony, and Toshiba. A CBE (Figure 10.1) consists of a Power PC (PPU) core, eight Synergistic Processing Elements or Units (SPEs or SPUs), and associated memory-transfer mechanisms . The SPUs are connected in a ring topology, and each SPU has its own local store. However, SPUs have no local cache and no branch-prediction logic. Data may be moved between an SPU’s local store and central memory via a DMA transfer, which is handled by a Memory Flow Control (MFC). Since the MFC runs independent of the SPUs, data transfer can be done concurrently with computation. The absence of branch-prediction logic in an SPU and the availability of SIMD instructions that can operate on vectors that are comprised of four numbers poses a challenge when developing high-performance CBE algorithms. © 2014 by Taylor & Francis Group, LLC.