O'Reilly logo

Pentaho for Big Data Analytics by Feris Thia, Manoj R Patil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Amazon public data sets

If you have a hosting account on an Amazon Web Service (AWS) such as Amazon Elastic MapReduce (EMR), you can use large public datasets provided freely by Amazon.

At the time of writing this book, there are 54 public datasets available, including human genome data, the U.S. census, the Freebase data dump, a material safety data sheet, and so on. You may find that some of the data sets are too huge to download. For example, the 1000 Genomes Project data size is about 200 TB.

For more information, visit http://aws.amazon.com/datasets.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required