3.5.6.1.1 Multithreading Architectures
Sometimes a CPU has to wait until either memory access is finished, or cache is filled, or pipeline is filled, etc. This becomes especially critical for high-end systems because of the high CPU operating frequency and the high price for not doing anything even during relatively short periods of time. This is where multithreading comes to help. Multithreading is based on the principle that if a single program is not capable of fully using all CPU resources, the processor can share these resources between multiple concurrent threads of execution. Every single program would not run faster in the multithreaded environment, but two parallel programs would run significantly faster than double the single program run time. If the second program also needs to wait for some resource while the first program is running at the same time, the third thread would bring additional benefit in running three programs concurrently. The same logic is applicable to any number of N parallel programs using M threads.
There are various ways to implement multithreading. Interleaved multithreading switches thread on every instruction to achieve Thread Level Parallelism (TLP); it is sometimes called fine-grain multithreading. Static interleaving allocates its own time slot for every thread causing significant performance limitation when only a single application has to run. Dynamic interleaving enables more flexible time slot allocation, but is significantly more complex ...