6.8    A FIELD-PROGRAMMABLE GATE ARRAY FOR SYSTOLIC COMPUTING

6.8.1    Introduction

Recent work [Gray89] [Gokhale91] [Bertin92] has shown that a handful of SRAM-based field-programmable gate arrays (FPGAs) wired together can achieve extraordinary levels of performance, often outperforming supercomputers at a tiny fraction of the cost. And, unlike ASIC solutions, these systems are general purpose and reprogrammable: they can be reconfigured in milliseconds to perform a completely new task.

The CLi6000 series of SRAM FPGA’s from Concurrent Logic has evolved through several generations and is the product of two joint-development efforts, with Apple Computer [Furtek90] and National Semiconductor [Furtek92]. The original motivation for the technology, and still a key application area, is the acceleration of compute-intensive algorithms by exploiting the parallelism inherent in hardware. The capabilities of the technology are illustrated through a massively parallel algorithm for performing motion estimation, an especially compute-intensive algorithm used in digital video compression. The algorithm which determines the best estimate of how blocks of pixels move from frame to frame requires about 4,000 MIPS to perform in real time on a standard video signal.

The algorithm is implemented as a systolic array of 256 processing elements, each achieving 100% efficiency with no wasted clock cycles through pipelining, and a clever synchronization of input pixel streams. The CLi6000’s high register ...

Get Field-Programmable Gate Arrays: Reconfigurable Logic for Rapid Prototyping and Implementation of Digital Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.