We can use a CPU timer for measuring the performance of CUDA programs, but it will not give accurate results. It will include thread latency overhead and scheduling in the OS, among many other factors. The time measured using the CPU will also depend on the availability of a high precision CPU timer. Many times, the host is performing asynchronous computation while the GPU kernel is running, hence CPU timers may not give the correct time for kernel executions. So, to measure the GPU kernel computation time, CUDA provides an event API.
A CUDA event is a GPU timestamp that's recorded at a specified point from your CUDA program. In this API, the GPU records the timestamp, which eliminates the issues that were present when using CPU ...