Skip to Content
Modern R Programming Cookbook
book

Modern R Programming Cookbook

by Jaynal Abedin
October 2017
Beginner to intermediate
236 pages
7h 38m
English
Packt Publishing
Content preview from Modern R Programming Cookbook

Extracting text data from an HTML page

You have seen an example of reading the HTML source code as a text vector in the Extracting unstructured text data from a plain web page recipe in this chapter. In this recipe, further processing is not straightforward because the output object contains plain text as well as HTML code tags. It is a time-consuming task to clean up the HTML tags from plain text.

In this recipe, you will read the same web page from the following link:

https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R

However, this time, you will use a different strategy so that you can play with HTML tags.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R Cookbook, 2nd Edition

R Cookbook, 2nd Edition

JD Long, Paul Teetor

Publisher Resources

ISBN: 9781787129054Supplemental Content