Skip to Content
Python and R for the Modern Data Scientist
book

Python and R for the Modern Data Scientist

by Rick J. Scavetta, Boyan Angelov
June 2021
Beginner to intermediate
196 pages
5h 1m
English
O'Reilly Media, Inc.
Content preview from Python and R for the Modern Data Scientist

Chapter 4. Data Format Context

In this chapter we’ll review tools in Python and R for importing and processing data in a variety of formats. We’ll cover a selection of packages, compare and contrast them, and highlight the properties that make them effective. At the end of this tour, you’ll be able to select packages with confidence. Each section illustrates the tool’s capabilities with a specific mini case study, based on tasks that a data scientist encounters daily. If you’re transitioning your work from one language to another or simply want to find out how to get started quickly using complete, well-maintained, and context-specific packages, this chapter will guide you.

Before we dive in, remember that the open source ecosystem is constantly changing. New developments, such as transformer models and explainable artificial intelligence (XAI), seem to emerge every other week. These often aim at lowering the learning curve and increasing developer productivity. This explosion of diversity also applies to related packages, resulting in a nearly constant flow of new and (hopefully) better tools. If you have a very specific problem, there’s probably a package already available for you, so you don’t have to reinvent the wheel. Tool selection can be overwhelming, but at the same time this variety of options can improve the quality and speed of your data science work.

The package selection in this chapter can appear limited in view; hence, it is essential to clarify our ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Practical Machine Learning in R

Practical Machine Learning in R

Fred Nwanganga, Mike Chapple
ggplot2 Essentials

ggplot2 Essentials

Donato Teutonico

Publisher Resources

ISBN: 9781492093398Errata Page