June 2019
Intermediate to advanced
218 pages
5h 19m
English
We've been using our trusty @btime macro from the BenchmarkTools package to measure the performance in this and previous chapters. While that works well, there are some GPU-specific methods that may occasionally be preferable.
First, the CuArrays.@sync macro may be used to force the execution of asynchronous kernels. These are usually only executed when its results are required by downstream processes, but that obviously invalidates the performance measurement. The CuArrays.@sync macro mitigates this issue by forcing the computation to occur within the timing loop. We show this using a trivial identity function. It is the difference between the two invocations as follows, which should be instructive:
julia> a = ...
Read now
Unlock full access