O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Formatting the data

Amazon ML works on comma separated values files (.csv), a very simple format where each row is an observation and each column is a variable or attribute. There are, however, a few conditions that should be met:

  • The data must be encoded in plain text using a character set, such as ASCII, Unicode, or EBCDIC
  • All values must be separated by commas; if a value contains a comma, it should be enclosed by double quotes
  • Each observation (row) must be smaller than 100k

There are also conditions regarding end of line characters that separate rows. Special care must be taken when using Excel on OS X (Mac), as explained on this page: http://docs.aws.amazon.com/machine-learning/latest/dg/understanding-the-data-format-for-amazon-ml.html. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required