Chapter 14
Technology-Aware Communication Architecture Design for Parallel Hardware Platforms
14.1 Introduction
The International Technology Roadmap for Semiconductors predicts that by 2024 more than 3000 processing cores will be integrated onto a single silicon die, thus giving rise to architectures providing high processing performance and low power even in the embedded computing domain [27]. The two cornerstones of such architectures will be parallel hardware processing and hardware realization of specific functions, thus providing computation scalability at an affordable power cost through specialization.
In this context, the most daunting challenge is to meet the enormous bandwidth capacities and stringent latency requirements when interconnecting such a huge number of cores. Traditional system-level interconnect fabrics such as shared buses, crossbars, and multilayer interconnects (combining shared-bus-based layers and crossbars together, Fig. 14.1a) are rapidly running out-of-steam for a number of reasons:
- risk of spaghetti wiring during place-and-route;
- poor performance and power scalability;
- use of complex and expensive bridges to support system heterogeneity;
- poor support for fault tolerance in on-chip communication.