Skip to Content
Machine Learning: Hands-On for Developers and Technical Professionals
book

Machine Learning: Hands-On for Developers and Technical Professionals

by Jason Bell
November 2014
Beginner to intermediate
408 pages
8h 44m
English
Wiley
Content preview from Machine Learning: Hands-On for Developers and Technical Professionals

Chapter 10Machine Learning as a Batch Process

This chapter investigates using batch processing to mine and learn from larger amounts of data instead of streaming data. After you've considered the size of data and what you're hoping to learn from it, you then look at various tools to extract, transform, and then process the data for useful results.

This chapter covers using Hadoop, Sqoop, and Pig for large-scale batch processing; these tools enable large data sets to be processed with relative ease. The chapter also discusses more traditional methods of creating programs to run batch processes on data.

Is It Big Data?

Although this book is about machine learning, I can't ignore the term “Big Data” that is increasingly a topic in business today. The phrase is touted as the savior, because it enables companies to see new things in their existing data. The term is broad but ultimately reduces down to the concept of a data set that becomes so large that it is difficult to process with traditional tools.

Depending on whom you ask, you might hear, “It's not Big Data if it's not working on petabytes of data,” or “When it becomes too big for a traditional database, then it's Big Data.” Both statements are true and valid. Personally, I like the term “data” regardless of whether the amount of data is big or small.

As time marches on, the answer to the “What is Big Data?” question will constantly change. The tools will also adapt, improve, and provide different insight. The key question ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Machine Learning for Developers

Machine Learning for Developers

Rodolfo Bonnin
Machine Learning for Business

Machine Learning for Business

Richard Nichol, Doug Hudgeon
Machine Learning

Machine Learning

Mohssen Mohammed, Muhammad Badruddin Khan, Eihab Mohammed Bashier
Machine Learning

Machine Learning

Subramanian Chandramouli, Saikat Dutt, Amit Kumar Das

Publisher Resources

ISBN: 9781118889497Purchase book