Working with CSV and JSON data
Extracting data from HTML pages is done using the techniques in the previous chapter, primarily using XPath through various tools and also with Beautiful Soup. While we will focus primarily on HTML, HTML is a variant of XML (eXtensible Markup Language). XML one was the most popular for of expressing data on the web, but other have become popular, and even exceeded XML in popularity.
Two common formats that you will see are JSON (JavaScript Object Notation) and CSV (Comma Separated Values). CSV is easy to create and a common form for many spreadsheet applications, so many web sites provide data in that for, or you will need to convert scraped data to that format for further storage or collaboration. JSON really ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access