book

Data Munging with Hadoop

Name: Data Munging with Hadoop
ISBN: 9780134435534

by Ofer Mendelevitch, Casey Stella

November 2015

Beginner to intermediate

31 pages

56m

English

Addison-Wesley Professional

Content preview from Data Munging with Hadoop

Preface

Most people imagine data science to be focused on advanced math and machine learning techniques. In reality, most data scientists find themselves spending a significant amount of time (70%–80%) in a variety of tasks that are often called “data munging,” including data cleansing and normalization, aggregation, sampling, transformation, and other forms of feature generation.

These activities are often considered low-value or “grunt work,” but they are actually interesting and sometimes require machine learning to accomplish. The resulting set of skills is a complex mishmash of normal data cleansing and extraction techniques that most data analysts or software engineers will recognize and more advanced skills that would normally be seen ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780134435534

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Data Munging with Hadoop

by Ofer Mendelevitch, Casey Stella

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.