Chapter 14

Pragmatic: When Attorneys Influence Technology Even More than Engineers

Early in 2011, the IBM supercomputer Watson thrilled the world by beating two of the most successful Jeopardy! contestants of all time. While IBM basked in the glory, Yahoo! reminded the world that it also deserved a fair amount of credit:

IBM's Watson depends on 200 million pages of content and 500 gigabytes of preprocessed information to answer the Jeopardy! questions. That huge catalog of documents had to be indexed so that Watson could answer questions within the 3-second time limit. On a single computer, generating that large catalog and index would take a lot of time, but dividing the work onto many computers makes it much faster. Apache Hadoop is the industry standard framework for processing large amounts of data on many computers in parallel. By using Hadoop MapReduce, Watson's development team was able to easily and reliably run their application on a large number of computers. For the last 5 years, since the start of Hadoop, Yahoo! has been the primary contributor.1

A comment to that blog sought to minimize that claim: “Watson does not use Hadoop during runtime (e.g., when answering questions)—only to prepare it's (sic) corpus of source materials (text, encyclopedias, etc.).”

Another blog boasted about Yahoo!'s contribution: “Yahoo! created Hadoop and since then has been the most active contributor to Apache Hadoop, contributing over 70 percent of the code and running the world's largest ...

Get The New Technology Elite: How Great Companies Optimize Both Technology Consumption and Production now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.