In the following section, we will define table distribution in Greenplum context and detail the other related aspects of distribution, like data skew.
Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy.
Every table in Greenplum has a data distribution method, the
DISTRIBUTED BY clause helps define the distribution strategy. We need to ensure that there is no data skew introduced on any of the segment hosts as a result of the distribution key defined.
There are two methods of distributing table data across segment hosts: