You will agree that a lot of the AVX instructions are far from intuitive, especially the different mask layouts that make the code difficult to read and understand. Moreover, the bit masks are sometimes written in hexadecimal notation, so you have to convert them first to binary notation to see what they do.
In this chapter, we will demonstrate that using AVX instructions can dramatically improve performance, and the effort of using AVX pays off in a number of cases. You can find an interesting white paper on benchmarking code at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf ...