If you have a hosting account on an Amazon Web Service (AWS) such as Amazon Elastic MapReduce (EMR), you can use large public datasets provided freely by Amazon.
At the time of writing this book, there are 54 public datasets available, including human genome data, the U.S. census, the Freebase data dump, a material safety data sheet, and so on. You may find that some of the data sets are too huge to download. For example, the 1000 Genomes Project data size is about 200 TB.
For more information, visit http://aws.amazon.com/datasets.