Appendix 5
Loop Unroll Degree Minimization: Experimental Results
All our benchmarks have been cross-compiled on a regular Dell workstation, equipped with Intel(R) Core(TM)2 CPU of 2.4 GHz and Linux operating system (kernel version 2.6, 64 bits).
A5.1. Stand-alone experiments with single register types
This section presents full experiments on a stand-alone tool by considering a single register type only. Our stand-alone tool is independent of the compiler and processor architecture. We will demonstrate the efficiency of our loop minimization method for both unscheduled loops (as studied in section 11.4) and scheduled loops (as studied in section 11.6).
A5.1.1. Experiments with unscheduled loops
In this context, our stand-alone tool takes a data dependence graph (DDG) as input, just after a periodic register allocation done by SIRA, and applies a loop unrolling minimization (LUM).
A5.1.2. Results on randomly generated data dependence graphs
First, our stand-alone software generates the number of distinct reuse circuits k and their weights (μ1, …, μk). Afterwards, we calculate the number of remaining registers and the loop unrolling degree ρ = lcm(μ1, …, μk). Finally, we apply our method for minimizing ρ.
We did extensive random generations on many configurations: we varied the number of available registers from 4 to 256, and we considered 10,000 random instances containing multiple ...
Get Advanced Backend Optimization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.