O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Creating an improved datasource

We need to do some manipulation on the new Titanic dataset before we upload it to S3 and create a new datasource in Amazon ML:

  1. Open this new Titanic dataset in your favorite editor.
  2. Select the first 1047 rows, and save them to a new CSV: ext_titanic_training.csv.
  3. Select the next 263 rows and the header row, and save them to a file ext_titanic_heldout.csv.

We need to update our schema. Open the schema file titanic_training.csv.schema, and add the following lines to the JSON:

{  "attributeName" : "is_age_missing",  "attributeType" : "BINARY"  }, {  "attributeName" : "log_fare",  "attributeType" : "NUMERIC"  }, {  "attributeName" : "title",  "attributeType" : "CATEGORICAL"  }, {  "attributeName" : "deck", "attributeType" ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required