Web scraping in 60 minutes
Retrieve, parse, and store data from any website with Python
Websites contain lots of useful data. Extracting that data is often difficult because websites are designed for humans (not bots), each page is different, and some are intentionally difficult to interpret. Learning how to effectively parse HTML is a crucial skill for professional Python developers and Python hobbyists alike.
Web scraping is foundational for product review, price comparison, and reputation tracking applications. It benefits projects that use internal data, as data is geometrically more valuable when it’s matched and fused with other sources of data.
Expert Max Humber guides you through the web scraping process from start to finish. Join in to build the skills to supercharge your personal and professional projects.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand: - How to scrape nearly any website - How to structure requests to include query strings and headers - How to effectively manipulate text stored in an HTML document
And you’ll be able to: - Use the progress bar library tqdm to monitor the performance and speed of your scrapers - Save the results of a scraper for later use
This training course is for you because...
This course is for you because… - You use Python regularly. - You want to scrape websites for personal and professional projects. - You want to learn about the latest and greatest scraping tools.
Prerequisites: - Experience with Python
About your instructor
Max Humber is a distinguished faculty member at General Assembly and the author of Personal Finance with Python. Previously, he was the first data scientist at Borrowell and the second data engineer at Wealthsimple.
The timeframes are only estimates and may vary according to how the class is progressing
Introduction (7 minutes)
- Group discussion: Introductions; number of websites you’ve scraped before; professional or personal interest
- Lecture: HTML and CSS basics; learning agenda
Retrieve (8 minutes) - Lecture: Requesting and downloading web page contents; request and response types; URL structure, param payloads, and headers - Hands-on exercise: Build a request URL - Q&A
Parse (25 minutes) - Lecture: Finding and extracting text based on HTML tag elements and attributes; string manipulation techniques and list comprehensions for scraping; looping, sleeping, and monitoring; hacking HTML tables with pandas - Hands-on exercise: Scrape a Wikipedia page - Q&A
Store (5 minutes) - Lecture: Saving results with context managers, pandas, and SQLite
Wrap-up and Q&A (5 minutes)