Getting data from blogs and websites

Given the abundance of websites with interesting articles, finding textual data to mine shouldn't be a huge problem. Manually, saving one article at a time obviously doesn't scale up very well, so in this section, we will discuss some opportunities to automate the process of getting data from websites.

Firstly, we will discuss two popular free blogging services, WordPress.com and Blogger, which offer a convenient API to interact with their platform. Secondly, we will introduce the RSS and Atom web standards, used by many blogs and news publishers to broadcast their content in a format that is easy to read for a computer. Finally, we will briefly discuss more possible choices, such as connecting to Wikipedia ...

Get Mastering Social Media Mining with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.