O'Reilly logo

Getting Started with Amazon Redshift by Stefan Bauer

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Distribution keys

To round out the discussion on things that you need to consider about your data as you build your tables is the distribution key. Redshift will both distribute and replicate data among nodes to achieve the massive parallelism that helps produce such good results. The distribution key is an important part of that process. It is best to try to keep together the largest amounts of data that you will be joining to avoid cross-node joins of large datasets whenever possible. Although these nodes are interconnected on a very high-speed network, the less data that you need to combine across servers in large joins, the better off you will ultimately be. The distribution key will define which data should be kept together on a given node. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required