book

Big Data Now: 2016 Edition

Name: Big Data Now: 2016 Edition
Author: O'Reilly Media, Inc.
ISBN: 9781491977484

by O'Reilly Media, Inc.

February 2017

Beginner to intermediate

160 pages

3h 43m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction
1. Careers in Data
Five Secrets for Writing the Perfect Data Science ResumeThere’s Nothing Magical About Learning Data SciencePut Aside the Technology StackKeep Data Lying AroundHave a StrategyHackExperimentData Scientists: Generalists or Specialists?Early DaysLater StageConclusion
2. Tools and Architecture for Big Data
Apache Cassandra for Analytics: A Performance and Storage AnalysisWide Spectrum of Storage Costs and Query SpeedsSummary of Methodology for AnalysisScan Speeds Are Dominated by Storage FormatStorage Efficiency Generally Correlates with Scan SpeedA Formula for Modeling Query PerformanceCan Caching Help? A Little Bit.The Future: Optimizing for CPU, Not I/OFiltering and Data ModelingCassandra’s Secondary Indices Usually Not Worth ItPredicting Your Own Data’s Query PerformanceConclusionsScalable Data Science with RData Science GophersGo, a Cure for Common Data Science PainsThe Go Data Science EcosystemData Gathering, Organization, and ParsingArithmetic and StatisticsExploratory Analysis and VisualizationMachine LearningGet Started with Go for Data ScienceApplying the Kappa Architecture to the Telco IndustryWhat Is Kappa Architecture?Building the Analytics PipelineIncorporating a Bayesian Model to Do Advanced AnalyticsConclusion
3. Intelligent Real-Time Applications
The World Beyond Batch StreamingStreaming 102Extend Structured Streaming for Spark MLSemi-Supervised, Unsupervised, and Adaptive Algorithms for Large-Scale Time SeriesSurfacing AnomaliesAdaptive, Online, Ensupervised Algorithms at ScaleDiscovering Relationships Among KPIs and Semi-Supervised LearningRelated Resources:Uber’s Case for Incremental Processing on HadoopNear-Real-Time Use CasesIncremental Processing via “Mini” BatchesChallenges of Incremental ProcessingTakeaways
4. Cloud Infrastructure
Where Should You Manage a Cloud-Based Hadoop Cluster?High-Level DifferentiatorsCloud Ecosystem IntegrationBig Data Is More Than Just HadoopKey TakeawaysSpark Comparison: AWS Versus GCPSubmitting Spark Jobs to the CloudConfiguring Cloud ServicesYou Get What You Pay ForPerformance ComparisonConclusionTime-Series Analysis on Cloud Infrastructure MetricsInfrastructure Usage DataScheduled Auto ScalingDynamic Auto ScalingAssess Cost Savings First
5. Machine Learning: Models and Training
What Is Hardcore Data Science—in Practice?Computing RecommendationsBringing Mathematical Approaches into IndustryUnderstanding Data Science Versus ProductionWhy Start Small?Distinguishing a Production System from Data ScienceData Scientists and Developers: Modes of CollaborationConstantly Adapt and ImproveTraining and Serving NLP Models Using Spark MLlibConstructing Predictive Models with SparkThe Process of Building a Machine-Learning ProductOperationalizationSpark’s RoleFitting It Into Our Existing Platform with IdiMLFaster, Flexible Performant SystemsThree Ideas to Add to Your Data Science ToolkitUse a Reusable Holdout Method to Avoid Overfitting During Interactive Data AnalysisUse Random Search for Black-Box Parameter TuningExplain Your Black-Box Models Using Local ApproximationsRelated ResourcesIntroduction to Local Interpretable Model-Agnostic Explanations (LIME)Intuition Behind LIMEExamplesConclusion
6. Deep Learning and AI
The Current State of Machine Intelligence 3.0Ready Player WorldWhy Even Bot-Her?On to 11111000001Peter Pan’s Never-Never LandInspirational Machine IntelligenceLooking ForwardHello, TensorFlow!Names and Execution in Python and TensorFlowThe Simplest TensorFlow GraphThe Simplest TensorFlow NeuronSee Your Graph in TensorBoardMaking the Neuron LearnFlowing OnwardCompressing and Regularizing Deep Neural NetworksCurrent Training Methods Are InadequateDeep CompressionDSD TrainingGenerating Image DescriptionsAdvantages of Sparsity

Content preview from Big Data Now: 2016 Edition

Chapter 5. Machine Learning: Models and Training

In this chapter, Mikio Braun looks at how data-driven recommendations are computed, how they are brought into production, and how they can add real business value. He goes on to explore broader questions such as what the interface between data science and engineering looks like. Michelle Casbon then discusses the technology stack used to perform natural language processing at startup Idibon, and some of the challenges they’ve tackled, such as combining Spark functionality with their unique NLP-specific code. Next, Ben Lorica offers techniques to address overfitting, hyperparameter tuning, and model interpretability. Finally, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin introduce local interpretable model-agnostic explanations (LIME), a technique to explain the predictions of any machine-learning classifier.

What Is Hardcore Data Science—in Practice?

By Mikio Braun

You can read this post on oreilly.com here.

During the past few years, data science has become widely accepted across a broad range of industries. Originally more of a research topic, data science has early roots in scientists’ efforts to understand human intelligence and to create artificial intelligence; it has since also proven that it can add real business value.

As an example, we can look at the company I work for—Zalando, one of Europe’s biggest fashion retailers—where data science is heavily used to provide data-driven recommendations, among other things. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data Just Right: Introduction to Large-Scale Data & Analytics

Publisher Resources

ISBN: 9781492049197

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Big Data Now: 2016 Edition

by O'Reilly Media, Inc.

Chapter 5. Machine Learning: Models and Training

What Is Hardcore Data Science—in Practice?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.