book

Big Data Now: Current Perspectives from O'Reilly Radar

Name: Big Data Now: Current Perspectives from O'Reilly Radar
Author: O'Reilly Radar Team
ISBN: 9781449315214

by O'Reilly Radar Team

August 2011

Beginner to intermediate

124 pages

3h 30m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Big Data Now
Foreword
1. Data Science and Data Tools
What is data science?What is data science?Where data comes fromWorking with data at scaleMaking data tell its storyData scientistsThe SMAQ stack for big dataMapReduceHadoop MapReduceOther implementationsStorageHadoop Distributed File SystemHBase, the Hadoop DatabaseHiveCassandra and HypertableNoSQL database implementations of MapReduceIntegration with SQL databasesIntegration with streaming data sourcesCommercial SMAQ solutionsQueryPigHiveCascading, the API ApproachSearch with SolrConclusionScraping, cleaning, and selling big dataData hand toolsHadoop: What it is, how it works, and what it can doFour free data tools for journalists (and snoops)WHOISBlekkobit.lyCompeteThe quiet rise of machine learningWhere the semantic web stumbled, linked data will succeedSocial data is an oracle waiting for a questionThe challenges of streaming real-time data
2. Data Issues
Why the term “data science” is flawed but usefulIt’s not a real scienceIt’s an unnecessary labelThe name doesn’t even make senseThere’s no definitionTime for the community to rallyWhy you can’t really anonymize your dataKeep the anonymizationAcknowledge there’s a risk of de-anonymizationLimit the detailLearn from the expertsBig data and the semantic webGoogle and the semantic webMetadata is hard: big data can helpBig data: Global good or zero-sum arms race?The truth about data: Once it’s out there, it’s hard to control
3. The Application of Data: Products and Processes
How the Library of Congress is building the Twitter archiveData journalism, data tools, and the newsroom stackData journalism and data toolsThe newsroom stackBridging the data divideThe data analysis path is built on curiosity, followed by actionHow data and analytics can improve educationData science is a pipeline between academic disciplinesBig data and open source unlock genetic secretsVisualization deconstructed: Mapping Facebook’s friendshipsMapping Facebook’s friendshipsStatic requires storytellingData science democratized
4. The Business of Data
There’s no such thing as big dataBig data and the innovator’s dilemmaBuilding data startups: Fast, big, and focusedSetting the stage: The attack of the exponentialsLeveraging the big data stackFast dataBig analyticsFocused servicesDemocratizing big dataData markets aren’t coming: They’re already hereAn iTunes model for dataData is a currencyBig data: An opportunity in search of a metaphorData and the human-machine connection
Copyright

Content preview from Big Data Now: Current Perspectives from O'Reilly Radar

Chapter 1. Data Science and Data Tools

What is data science?

The future belongs to the companies and people that turn data into products.

by Mike Loukides

We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data?

In this post, I examine the many sides of data science—the technologies, the companies and the unique skill sets.

What is data science?

The web is full of “data-driven apps.” Almost any e-commerce application is a data-driven application. There’s a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn’t really what we mean by “data science.” A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products.

One of the earlier data products on the Web was the CDDB database. The developers of CDDB realized that any CD had a unique signature, based on the exact length (in samples) of each track on the CD. Gracenote built a database of track lengths, and coupled it to a database of album metadata (track titles, artists, album titles). If you’ve ever used iTunes to rip a CD, you’ve taken advantage of this database. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449316297Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Big Data Now: Current Perspectives from O'Reilly Radar

by O'Reilly Radar Team

Chapter 1. Data Science and Data Tools

What is data science?

What is data science?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.