We have used web-based APIs to extract data in several of our previous chapters. For instance, in Chapter 7, Follow Recommendations Using Graph Mining, we used Twitter's API to extract data. Collecting data is a critical part of the data mining pipeline, and web-based APIs are a fantastic way to collect data on a variety of topics.
There are three things you need to consider when using a web-based API for collecting data: authorization methods, rate limiting, and API endpoints.
Authorization methods allow the data provider to know who is collecting the data, in order to ensure that they are being appropriately rate-limited and that data access can be tracked. For most websites, a personal account is often enough ...