Appendix 1

Presentation of the Benchmarks used in our Experiments

This appendix describes the benchmarks and the data dependence graphs (DDG) that we used in our experiments. The DDGs have been generated by the st200cc compiler from STmicroelectronics, using the option -03. Superblock formation and loop unrolling are enabled, and instruction selection has been performed for the ST231 VLIW processor.

The ST231 processor used for our experiments executes up to four operations per clock cycle with a maximum of one control operation (goto, jump, call, return), one memory operation (load, store, prefetch) and two multiply operations per clock cycle. All arithmetic instructions operate on integer values with operands belonging either to the general register (GR) file (64 × 32 bit) or to the branch register (BR) file (8 × 1 bit). Floating-point computations are emulated by software. In order to eliminate some conditional branches, the ST200 architecture also provides conditional selection. The processing time of any operation is a single clock cycle, while the latencies between operations range from 0 to 3 clock cycles.

Note that we make our DDG public for helping the research community to share their data and to reproduce our performance numbers.

A1.1. Qualitative benchmarks presentation

We consider a representative set of applications for both high performance and embedded benchmarks. We chose to optimize the set of the following collections of well-known applications programmed in ...

Get Advanced Backend Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.