11.12 SUMMARY OF WORK DONE IN THIS CHAPTER
At this stage, we were able to completely specify the reduced computation domain
associated with the matrix–matrix multiplication algorithm. This
could represent the required concurrent threads for a software implementation or the required PEs needed for a systolic array hardware implementation. Below we summarize what we have done and why:
1. We started by expressing the matrix multiplication as an iterative Equation (Eq. 11.1).
2.
The indices of the iterative Equation defined the multidimensional computation domain
. The facets and vertices of this domain were studied in Sections 11.3 and 11.4.
3.
We identified the dependence matrix A associated with each variable of the algorithm in Section 11.5. Based on this matrix, we identified its nullvectors, which represent the broadcast subdomain B of the variable. We were also able to identify the intersection points of B with
. These intersection points help in supplying input variables or extracting output results. At this stage, we can decide whether to broadcast or to pipeline our variables. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access