Consider a situation in which a large number of threads try to modify a small portion of memory. This is a frequently occurring phenomenon. It creates more problems when we try to perform a read-modify-write operation. The example of this operation is d_out[i] ++, where the first d_out[i] is read from memory, then incremented and then written back to the memory. However, when multiple threads are doing this operation on the same memory location, it can give a wrong output.
Suppose one memory location has an initial value of six, and threads p and q are trying to increment this memory location, then the final answer should be eight. But at the time of execution, it may happen that both the p and q threads read this value ...