Get full access to Effective Amazon Machine Learning and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Creating a titanic database

We are going to start from scratch and go back to the original Titanic dataset available at https://github.com/alexperrier/packt-aml/blob/master/ch4/original_titanic.csv. Follow these steps to prepare the CSV file:

Open the original_titanic.csv file.
Remove the header row.
Remove the following punctuation characters: ,"().

The file should only contain data, not column names. This is the original file with 1309 rows. These rows are ordered by pclass and alphabetical names. The resulting file is available at https://github.com/alexperrier/packt-aml/blob/master/ch4/titanic_for_athena.csv. Let us create a new athena_data folder in our S3 bucket and upload the titanic_for_athena.csv file. Now go to the Athena console. ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now