You want to parallelize encryption, decryption, or keystream generation.
Only some cipher modes are naturally parallelizable in a way that doesn’t break compatibility. In particular, CTR mode is naturally parallizable, as are decryption with CBC and CFB. There are two basic strategies: one is to treat the message in an interleaved fashion, and the other is to break it up into a single chunk for each parallel process.
The first strategy is generally more practical. However, it is often difficult to make either technique result in a speed gain when processing messages in software.
Parallelizing encryption and decryption does not necessarily result in a speed improvement. To provide any chance of a speedup, you’ll certainly need to ensure that multiple processors are working in parallel. Even in such an environment, data sets may be too small to run faster when they are processed in parallel.
Some cipher modes can have independent parts of the message operated upon independently. In such cases, there is the potential for parallelization. For example, with CTR mode, the keystream is computed in blocks, where each block of keystream is generated by encrypting a unique plaintext block. Those blocks can be computed in any order.
In CBC, CFB, and OFB modes, encryption can’t really be parallelized because the ciphertext for a block is necessary to create the ciphertext ...