A hands on guide to web scraping and text mining for both beginners and experienced users of R
Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.
Provides basic techniques to query web documents and data sets (XPath and regular expressions).
An extensive set of exercises are presented to guide the reader through each technique.
Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.
Case studies are featured throughout along with examples for each technique presented.
R code and solutions to exercises featured in the book are provided on a supporting website.
Table of contents
- Chapter 1: Introduction
Part One: A Primer on Web and Data Technologies
- Chapter 2: HTML
- Chapter 3: XML and JSON
- Chapter 4: XPath
- Chapter 5: HTTP
- Chapter 6: AJAX
- Chapter 7: SQL and relational databases
- Chapter 8: Regular expressions and essential string functions
Part Two: A Practical Toolbox for Web Scraping and Text Mining
- Chapter 9: Scraping the Web
- Chapter 10: Statistical text processing
- Chapter 11: Managing data projects
Part Three: A Bag of Case Studies
- Chapter 12: Collaboration networks in the US Senate
- Chapter 13: Parsing information from semistructured documents
- Chapter 14: Predicting the 2014 Academy Awards using Twitter
- Chapter 15: Mapping the geographic distribution of names
- Chapter 16: Gathering data on mobile phones
- Chapter 17: Analyzing sentiments of product reviews
- General index
- Package index
- Function index
- End User License Agreement
- Title: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
- Release date: January 2015
- Publisher(s): Wiley
- ISBN: 9781118834817
You might also like
Spark: The Definitive Guide
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the …
Fluent Python, 2nd Edition
Python’s simplicity lets you become productive quickly, but often this means you aren’t using everything it …
Kafka: The Definitive Guide, 2nd Edition
Every enterprise application creates data, whether it consists of log messages, metrics, user activity, outgoing messages, …
Python for Finance, 2nd Edition
The financial industry has recently adopted Python at a tremendous rate, with some of the largest …