O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Generating the shuffled datasets

We will use the datasource creation DataRearrangement field to split the data into a training and a validation subset. So, we only need to create five files of shuffled data in the first place.

The following shell script will create five shuffled versions of the Ames Housing dataset and upload the files to S3. You can either save that code in a file with the .sh extension (datasets_creation.sh) or run it with sh ./datasets_creation.sh:

#!/bin/bashfor k in 1 2 3 4 5 do    filename="data/ames_housing_shuffled_$k.csv"    gshuf data/ames_housing_nohead.csv -o data/ames_housing_nohead.csv    cat data/ames_housing_header.csv data/ames_housing_nohead.csv > tmp.csv;    mv tmp.csv $filename aws s3 cp ./$filename s3://aml.packt/data/ch8/ ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required