Data Scraping and Data Mining from Beginner to Pro with Python

Video description

Data scraping is the technique of extracting data from the Internet. The course Data Scraping and Data Mining from Beginner to Professional is crafted to cover topics that result in the development of the most in-demand skills in the workplace. These topics will help you understand the concepts and methodologies with regard to Python. The course is easy to understand, imaginative and descriptive, comprehensive, practical with live coding, full of quizzes with solutions, rich with state-of-the-art and updated knowledge of this field.

This course is designed for beginners. We’ll spend sufficient time on the fundamentals. Then, we will gradually go far deep with a lot of practical implementations where every step will be explained in detail.

As this course is essentially a compilation of all the basics, you will move ahead at a steady rate. You will experience more than what you have learned. Most of these activities are designed to get you up and running with implementations.

The four hands-on projects included are the most important part of this course. These projects allow you to experiment for yourself with trial and error. You will learn from your mistakes. Importantly, you will understand the potential gaps that may exist between theory and practice.

What You Will Learn

  • Understand the difference between synchronous and asynchronous requests
  • Apply BS4 for parsing the response data from the server
  • Explore the different tools that are used for data scraping; namely, Requests, BS4, Scrapy, Selenium
  • Understand BS4 parser functions for getting the data out of the HTML
  • Learn to use Scrapy to write the spiders for crawling websites and extracting data
  • Learn to use Selenium to understand the automation and control of web flows

Audience

This course is for people who are beginners and absolutely new to data scraping and for individuals who want to make smart solutions and learn data scraping with real data using Python.

It is also useful for data scientists, machine learning experts, drop shippers who are interested in learning data scraping along with its implementation in realistic projects.

About The Author

AI Sciences: AI Sciences are experts, PhDs, and artificial intelligence practitioners, including computer science, machine learning, and Statistics. Some work in big companies such as Amazon, Google, Facebook, Microsoft, KPMG, BCG, and IBM.

AI sciences produce a series of courses dedicated to beginners and newcomers on techniques and methods of machine learning, statistics, artificial intelligence, and data science. They aim to help those who wish to understand techniques more easily and start with less theory and less extended reading. Today, they publish more comprehensive courses on specific topics for wider audiences.

Their courses have successfully helped more than 100,000 students master AI and data science.

Table of contents

  1. Chapter 1 : Introduction
    1. Why Data Scraping
    2. Applications of Data Scraping
    3. Introduction of Instructor
    4. Introduction to Course, Scraping, Tools
    5. Projects Overview
  2. Chapter 2 : Requests
    1. Introduction to Python Requests
    2. Hand on with Requests
    3. Extracting Quotes Manually
    4. Quiz (Extracting Authors)
    5. Solution (Extracting Authors)
    6. Pagination
    7. Quiz ( Extracting Author and Quotes)
    8. Solution 01 (Extracting Author and Quotes)
    9. Solution 02 (Extracting Author and Quotes)
    10. Ajax Requests
    11. Ajax Requests for Cricket Information
    12. Ajax Requests Pagination
    13. Quiz (Extracting Top Stats from Cricket info)
    14. Solution 01 (Extracting Top Stats from Cricket Information)
    15. Solution 02 (Extracting Top Stats from Cricket Information)
  3. Chapter 3 : Beautiful Soap 4 (BS4)
    1. Introduction to BS4
    2. Quiz (Difference Between Requests and BS4)
    3. Solution (Difference Between Requests and BS4)
    4. Hands-On with BS4
    5. Extracting Data from Tree
    6. Extracting Quotes from the Website
    7. Quiz (Extracting Author Names)
    8. Solution (Extracting Author Names)
    9. Attributes of Tags in BS4
    10. Multi-Valued Attributes of Tags in BS4
    11. Scraping Movie Names from IMDB
    12. Quiz (Getting the Ratings, Year, Name of the Movie)
    13. Solution 01 (Getting the Ratings, Year, Name of the Movie)
    14. Solution 02(Getting the Ratings, Year, Name of the Movie)
    15. Scraping Time, Genre, and Release Date from IMDB 01
    16. Scraping Time, Genre, and Release Date from IMDB 02
    17. Combining Two Requests Data for IMDB
    18. Movies Recommender System (Creating Movie URL)
    19. Movies Recommender System (Creating Director URL)
    20. Movies Recommender System using BS4 (Getting Top 4 Movies)
    21. Movies Recommender System using BS4 (Merge All Requests Together)
  4. Chapter 4 : CSS Selectors
    1. Introduction to CSS Selectors
    2. CSS Selectors Hands-On (Tags)
    3. Quiz (Tags)
    4. Solution (Tags)
    5. CSS Selectors Hands-On (Descendants, ID, Class)
    6. Quiz (Descendants)
    7. Solution (Descendants)
    8. Quiz (ID)
    9. Solution (ID)
    10. Quiz (Class)
    11. Solution (Class)
    12. CSS Selectors Hands-On (Nested Tags, ID Tags, Class Tags)
    13. Quiz (Class with Tag)
    14. Solution (Class with Tag)
    15. CSS Selectors Hands-on(Coma Separator, Universal Selectors
    16. Quiz (Combining Two Selectors)
    17. Solution (Combining Two Selectors)
    18. CSS Selectors Hands-On (Sibling Notations and Direct Child)
    19. Quiz (Adjacent Sibling)
    20. Solution (Adjacent Sibling)
    21. Quiz (General Sibling)
    22. Solution (General Sibling)
    23. CSS Selectors Hands-On (Child Selectors)
    24. Quiz (First Child)
    25. Solution (First Child)
    26. Quiz (Only Child)
    27. Solution (Only Child)
    28. Quiz (Last Child)
    29. Solution (Last Child)
    30. CSS Selectors Hands-On (Negations, Attributes)
    31. Quiz (Negation)
    32. Solution (Negation)
    33. CSS Selectors Hands-On (Attributes, Attribute Values)
    34. Quiz (Attribute Values)
    35. Solution (Attribute Values)
    36. CSS Selectors Hands-On (Attributes Wild Cards Values)
    37. Quiz (Attributes Wild Card)
    38. Solution (Attributes Wild Card)
  5. Chapter 5 : Scrapy
    1. Introduction to Scrapy
    2. Comparison of Scrapy and Requests
    3. Scrapy at a Glance Documentation
    4. Getting Started with Scrapy
    5. Running Documentation Spider 1
    6. Running Documentation Spider 2
    7. Writing Spider from the Scratch
    8. Understanding the Response (URL, Status)
    9. Understanding the Response (Headers)
    10. Understanding the Response (Values in Headers)
    11. Understanding the Response (Body)
    12. Understanding the Response (Request)
    13. Understanding the Response (Meta)
    14. Understanding the Response (Flags, Certificate, ip_address, Copy)
    15. Understanding the Response (replace, urljoin, follow, follow_all)
    16. Response CSS and Scrapy Shell
    17. Extracting Quotes
    18. Understanding Nested Selectors
    19. Extracting the Author and Quotes
    20. Checking for Next Page
    21. Checking for Next Page in Spider
    22. Checking for Next Page URL
    23. Scraping Quotes from Next Pages
    24. Exporting Extracted Data
    25. Quiz (Get the Tags)
    26. Solution (Get the Tags)
    27. Next Website
    28. CSS Selectors for Movie Names and URLs
    29. Combined CSS Selectors for Movie Names and URLs
    30. Send Request to the Film Information Page
    31. Merge Data from Two Callbacks
    32. Extracting Movie Duration and Genres
    33. Exporting the Extracted Data
    34. Quiz (Extracting the Year)
    35. Solution (Extracting the Year)
    36. Getting Director Name and URL
    37. Getting Top Four Movies of Directors
    38. Extracting Data
    39. Extracting Data Anomaly (CSS Selector)
    40. Extracting Data Anomaly (dont_filter Flag)
  6. Chapter 6 : Scrapy Project
    1. Hugoboss Website for Scraping
    2. Understanding Site Structure
    3. Writing CSS Selectors for Listings
    4. Listings in Scrapy Shell
    5. Sending Request to Listings URLs
    6. Writing CSS for Getting the Product from the listings
    7. Extracting Products URL from the Listings
    8. Sending Requests to Products of the Listings
    9. Writing CSS for Getting the Product Information
    10. Getting the Bigger Images of the Product
    11. Adding Pagination to Spider and Running It
    12. Output of the Spider
  7. Chapter 7 : Selenium
    1. Introduction to Selenium
    2. Getting Started with Selenium
    3. Configuring the Webdriver
    4. Extracting Quotes
    5. Extracting Quotes and Author Names
    6. Quiz (Extracting Quotes)
    7. Solution (Extracting Quotes)
    8. Clicking on Button
    9. Pagination and Extracting Data
    10. Exception Handling for Unavailable Elements
    11. Navigating the Website for Login
    12. Quiz (Log In and Extract Quote)
    13. Solution (Log In and Extract Quote)
  8. Chapter 8 : Project Selenium
    1. Overview of Project
    2. Closing the Cookie Button
    3. Setting the Language for Translation
    4. Sending the Text for Translation
    5. Downloading the Translation
    6. Reading Data from File for Translation

Product information

  • Title: Data Scraping and Data Mining from Beginner to Pro with Python
  • Author(s): AI Sciences
  • Release date: September 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781801818483