Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Chapter 14. Data Exchange

Data can be stored and exchanged in many different formats. Thus far, we’ve focused on plain-text delimited and fixed-width formats (Chapter 8). In this chapter, we expand our horizons a bit and introduce a few other popular formats. While CSV, TSV, and FWF files are useful for organizing data into a dataframe, other file formats can save space or represent more complex data structures. Binary files (binary is a term for formats that aren’t plaintext) can be more economical than plain-text data sources. For example, in this chapter we introduce NetCDF, a popular binary format for exchanging large amounts of scientific data. Other plain-text formats like JSON and XML can organize data in ways that are more general and useful for complex data structures. Even HTML web pages, a close cousin to XML, often contain useful information that we can scrape and wrangle into shape for analysis.

In this chapter, we introduce these popular formats, describe a mental model for their organization, and provide examples. In addition to introducing these formats, we cover programmatic ways to acquire data online. Before the internet, data scientists had to physically move disk drives to share data with one another. Now we can freely retrieve datasets from computers across the world. We introduce HTTP, the primary communication protocol for the web, and REST, an architecture to transfer data. By learning a bit about these web technologies, we can take better advantage of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content