Let's take a look at the following steps to learn how to use a regular expression in text processing:
- To read the text from an URL, first, you need to create a connection between your R session and the web page.
- Note that your computer must be connected to the internet to run this recipe. Once you have connected and created the connection of R session, you are ready to retrieve the HTML code form of the page. Here is the code to do the whole thing:
sourceURL <- "https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R" link2web <- url(sourceURL) htmlText <- readLines(link2web) close(link2web)
- Now you have the HTML text in an object called htmlText. This object contains plain text as well as HTML tag pairs. The task ...