Getting Started with Amazon Athena
Using SQL to query distributed, big data
This event has ended.
Any application at scale collects a variety of data, and more often than not, this data is stored not in a traditional relational database management system (RDBMS) but as JSON or Parquet documents in some sort of cloud storage, such as Amazon S3. This makes running queries on that data difficult. You have to transform the data into a tabular format, save it into an RDBMS, scale the RDBMS, etc.
To avoid this cumbersome process, Amazon came up with Athena. You can use Athena to query data that’s already present in S3, with minimal to no transformation required. Athena comes in handy when you have to run ad hoc queries on fresh data or even data from a year ago. With almost all features of SQL available in Athena, experienced BI users can easily begin extracting value from the data without needing to learn a new query language.
Join expert Sunny Srinidhi to explore the differences between a traditional RDBMS and Athena and learn how to use Amazon S3 as a data lake and connect it to Athena so that you can query that data from Athena. Over three hours, you’ll get hands-on experience saving data to S3 in the form of CSV, JSON, and Parquet files (either directly to S3 or by using a data pipeline such as AWS Kinesis Firehose), using the Athena console to write ad hoc SQL queries on that data, and using the AWS Python SDK to run the same queries on the same data, but from your own Python service.
What you’ll learn and how you can apply it
By the end of this live online course, you’ll understand:
- The difference between a traditional RDBMS, cloud native databases, and Athena
- How data is stored for Athena
- How to connect Athena to your data
- How to use the Athena console to query your data
And you’ll be able to:
- Query data in CSV, Parquet, and JSON files stored in S3 from Athena
- Use the Athena console to query data
- Use the AWS SDK in a Python application to run queries on Athena remotely
This course is for you because…
- You’re a business intelligence (BI) professional working with big data (specifically, running ad hoc queries on your data).
- You’re a programmer writing services that have to run remote queries on Athena.
- You already work with SQL but are hesitant to learn proprietary query languages that aren’t standardized.
- You’re interested in working with big data and analytics.
- A working knowledge of SQL and Python
- An AWS account
- Download and explore the “Sales Records” sample CSV dataset
The timeframes are only estimates and may vary according to how the class is progressing.
Introduction (15 minutes)
- Presentation: Overview of the data used in the example; overview of the Python project; demo of the finished product
Creating a table in Athena (40 minutes)
- Presentation: Getting the sample dataset; understanding the sample dataset; creating a bucket in S3 to host the sample dataset; creating a table in Athena; declaring the columns required in Athena
- Jupyter Notebook exercise: Create a table in Athena
Break (5 minutes)
Querying data in Athena (40 minutes)
- Presentation: Starting with simple SELECT queries; queries with WHERE clauses; aggregation queries; how query results are stored
- Jupyter Notebook exercise: Run sample queries in Athena
How to use the Jupyter Notebook (15 minutes)
- Demo and Jupyter Notebook exercise: Explore the Jupyter Notebook
Break (5 minutes)
The final Python project (50 minutes)
- Presentation: Configuring the AWS Python SDK; writing the service layer required for Athena; rewriting the sample queries used earlier in the demo; executing these queries remotely and getting results
- Jupyter Notebook exercise: Write a Python service
Wrap-up and Q&A (10 minutes)
Sunny Srinidhi is a Senior Software Engineer at Lowe’s India and has been working in the data space for over seven years. He writes microservices to work with data at scale and has experience using a variety of databases, including Oracle, MySQL, MongoDB, and Apache Hbase. A frequent blogger on Medium and his own personal blog, Sunny is always interested in learning about and exploring the next exciting data-related tool.