October 2013
Intermediate to advanced
172 pages
3h 51m
English
In the following section, we will define table distribution in Greenplum context and detail the other related aspects of distribution, like data skew.
Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy.
Every table in Greenplum has a data distribution method, the DISTRIBUTED BY clause helps define the distribution strategy. We need to ensure that there is no data skew introduced on any of the segment hosts as a result of the distribution key defined.
There are two methods of distributing table data across segment hosts:
Read now
Unlock full access