Skip to Content
Modern R Programming Cookbook
book

Modern R Programming Cookbook

by Jaynal Abedin
October 2017
Beginner to intermediate
236 pages
7h 38m
English
Packt Publishing
Content preview from Modern R Programming Cookbook

How to do it…

The HTML file is a tree-like structure. It represents the data using various internal nodes. Each node is represented by tag pair, such as <p>…</p>. The steps are as follows:

  1. Create an R object containing the character string of the website address.
  2. Load the XML library into your R session.
  3. Parse the link into the htmlTreeParse() function, and make sure you have mentioned useInternalNodes=TRUE.
  4. To get an exact plain text value from the HTML tree, you can use the xpathSApply() function with HTML tag pairs.

The R code corresponding to the preceding steps is as follows:

        library(XML)        sourceURL <-            "https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R"        link2web <- url(sourceURL)        htmlText <- readLines(link2web) close(link2web) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R Cookbook, 2nd Edition

R Cookbook, 2nd Edition

JD Long, Paul Teetor

Publisher Resources

ISBN: 9781787129054Supplemental Content