O'Reilly logo

Instant Web Scraping with Java by Ryan Mitchell

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Writing a simple scraper (Simple)

Wikipedia is not just a helpful resource for researching or looking up information but also a very interesting website to scrape. They make no efforts to prevent scrapers from accessing the site, and, with a very well-marked-up HTML, they make it very easy to find the information you're looking for. In this project, we will scrape an article from Wikipedia and retrieve the first few lines of text from the body of the article.

Getting ready

It is recommended that you complete the first two recipes in this book, or at least have some working knowledge of Java, and the ability to create and execute Java programs at this point.

As an example, we will use the article from the following Wikipedia link:

http://en.wikipedia.org/wiki/Java ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required