Shear sort is extremely efficient, but only on a parallel processor. When you see the benchmark output, you'll notice a higher time than the other sorts. However, when operating with more than one processor, a 2D mesh is created. The advantage of the 2D mesh is that sorts can be made concurrently on the rows and columns—you get two sorts for each clock cycle! This algorithm is a perfect example of divide-and-conquer.
class Shear_sort def sort(a) div = 1 i = 1 while i * i <= a.length if a.length % i == 0 div ...