O'Reilly logo

Getting Started with Greenplum for Big Data Analytics by Sunila Gollapudi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Greenplum table distribution and partitioning

In the following section, we will define table distribution in Greenplum context and detail the other related aspects of distribution, like data skew.

Distribution

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy.

Every table in Greenplum has a data distribution method, the DISTRIBUTED BY clause helps define the distribution strategy. We need to ensure that there is no data skew introduced on any of the segment hosts as a result of the distribution key defined.

There are two methods of distributing table data across segment hosts:

  • Column oriented/Hash distribution: This is a distribution mechanism that considers ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required