Skip to Content
Modern R Programming Cookbook
book

Modern R Programming Cookbook

by Jaynal Abedin
October 2017
Beginner to intermediate
236 pages
7h 38m
English
Packt Publishing
Content preview from Modern R Programming Cookbook

How it works…

Unlike readLines(), the read_html() function does not read the source code line by line, rather it reads the entire HTML source code into a single object while maintaining the original HTML structure. If you want to see the output of the HTML source code, you have to retrieve the plain text component under various HTML tags.

The rvest library has functions to interact with various HTML tags and retrieve the plain text elements from it. For example, suppose you are interested in retrieving the title of the web page. The title of the page has been enclosed by the <title>…</title> HTML tag pair. The following code will give you the plain text title of the page:

    html_text(html_nodes(htmlTextData,xpath="//title"))

Notice that there ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R Cookbook, 2nd Edition

R Cookbook, 2nd Edition

JD Long, Paul Teetor

Publisher Resources

ISBN: 9781787129054Supplemental Content