Data Resources – Making Data Work

Welcome to the essential training and information source for data science and big data—with books, in-person and online events, reports, industry news, and much more. Practical Machine Learning: Innovations in Recommendation Practical Machine Learning: Innovations in Recommendation
by Ted Dunning, Ellen Friedman Getting Started with Impala Getting Started with Impala
by John Russell Using Flume Using Flume
by Hari Shreedharan Designing Data-Intensive Applications Designing Data-Intensive Applications
by Martin Kleppmann Hadoop Application Architectures Hadoop Application Architectures Practical Machine Learning: A New Look at Anomaly Detection Practical Machine Learning: A New Look at Anomaly Detection
by Ted Dunning, Ellen Friedman Professional Microsoft SQL Server 2014 Administration Professional Microsoft SQL Server 2014 Administration SAP BusinessObjects Reporting Cookbook SAP BusinessObjects Reporting Cookbook Learning Storm Learning Storm Nulls, Three-Valued Logic, and Missing Information Nulls, Three-Valued Logic, and Missing Information An Introduction to Set Theory An Introduction to Set Theory Learning Neo4j Learning Neo4j


Change the World with Data
Join us at an upcoming O'Reilly Strata Conference

Strata Conference & Hadoop World
New York, NY | October 15-17, 2014

Strata Conference in Barcelona
Barcelona, Spain | November 19-21, 2014

Strata Santa Clara

Data News

Scaling NoSQL databases: 5 tips for increasing performance

By Alex Bordei
September 24, 2014

Editor’s note: this post is a follow-up to a recent webcast, “Getting the Most Out of Your NoSQL DB,” by the post author, Alex Bordei. As product manager for Bigstep’s Full Metal Cloud, I work with a lot of amazing …

Announcing Spark Certification

By Ben Lorica
September 18, 2014

Editor’s note: full disclosure — Ben is an advisor to Databricks. I am pleased to announce a joint program between O’Reilly and Databricks to certify Spark developers. O’Reilly has long been interested in certification, and with this inaugural program, we believe …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

More News >

Data Experts

Benjamin Bengfort Benjamin Bengfort Benjamin Bengfort is a data scientist with a passion for massive machine learning involving gigantic natural language corpora, and has been leveraging that passion to develop a keen understanding of recommendation algorithms at Cobrain in Bethesda, MD where he serves as the Chief Data Scientist. With a professional background in…

Kevin O'Dell Kevin O'Dell Kevin O’Dell has been an HBase contributor since 2012 where he has been active in the community. Kevin has spoken at numerous Hadoop User Groups, Hadoop Summit, and HBaseCons. Kevin currently works as a Systems Engineer for Cloudera building Big Data applications with a specialization in HBase. In this role…

Jean-Marc  Spaggiari Jean-Marc Spaggiari Jean-Marc Spaggiari, an HBase contributor since 2012, works as an HBase specialist Solutions Architect for Cloudera to support Hadoop and HBase through technical support and consulting work. He has worked with some of the biggest HBase users in North America. Jean-Marc’s prime role is to support HBase users over their…

Dave Cross Dave Cross is the owner of Magnum Solutions Ltd., a London-based Perl Consultancy, and is also the author of the well-respected Data Munging with Perl.

More Data Experts >

Video Compilation - Available Now

Strata Conference video compilation

Get Your Front-Row Access to Strata Conference

Gain a clear perspective on the future of big data--and all the analytics, architectures, techniques, tools, and technologies you need to use data successfully. With this complete video compilation, you'll get a front-row seat to the keynotes, workshops, and sessions at O'Reilly's Strata Conference Santa Clara 2014.

More about this video >

Data Science Starter Kit

Data Science Books

This kit includes everything you need to get started with data analysis, visualization, and management.

"'Data Scientist' is now the hottest job title in Silicon Valley."

– Tim O'Reilly

Learn More

Data Webcasts
Learn directly from data experts. Join us for these free, live webcasts.

Becoming a Utopian Data-Driven Enterprise: Lessons From the Early Adopters of Predictive Intelligence
October 1, 2014 - 09AM PT,

Spark 1.1 and Beyond!
October 2, 2014 - 09AM PT,

Beating Billion Dollar Fraud Using Anomaly Detection
October 8, 2014 - 10AM PT,

Hadoop Means Business: The Changing Role of Hadoop in Business Outcomes
October 14, 2014 - 10AM PT,

Get More Value out of Multiple Hadoop Data Centers
October 23, 2014 - 10AM PT,

Sentiment Analysis Using Support Vector Machines in Ruby
November 11, 2014 - 10AM PT,

More Webcasts >