Using AWS Glue and Amazon Athena

In this section, we will use AWS Glue to create a crawler, an ETL job, and a job that runs KMeans clustering algorithm on the input data.

We use a publicly available dataset about the students' knowledge status on a subject. The dataset and the field descriptions are available for download from the UCI site:

  1. Log in to the AWS Management Console and go to the Glue console. Click on the Add crawler button.
  2. Specify the Crawler name as User Modeling Data Crawler as shown here. Click on the Next button:
  1. In the Add a data store screen, select S3

Get Learning AWS - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.