book

Big Data Now: 2016 Edition

Name: Big Data Now: 2016 Edition
Author: O'Reilly Media, Inc.
ISBN: 9781491977484

by O'Reilly Media, Inc.

February 2017

Beginner to intermediate

160 pages

3h 43m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction
1. Careers in Data
Five Secrets for Writing the Perfect Data Science ResumeThere’s Nothing Magical About Learning Data SciencePut Aside the Technology StackKeep Data Lying AroundHave a StrategyHackExperimentData Scientists: Generalists or Specialists?Early DaysLater StageConclusion
2. Tools and Architecture for Big Data
Apache Cassandra for Analytics: A Performance and Storage AnalysisWide Spectrum of Storage Costs and Query SpeedsSummary of Methodology for AnalysisScan Speeds Are Dominated by Storage FormatStorage Efficiency Generally Correlates with Scan SpeedA Formula for Modeling Query PerformanceCan Caching Help? A Little Bit.The Future: Optimizing for CPU, Not I/OFiltering and Data ModelingCassandra’s Secondary Indices Usually Not Worth ItPredicting Your Own Data’s Query PerformanceConclusionsScalable Data Science with RData Science GophersGo, a Cure for Common Data Science PainsThe Go Data Science EcosystemData Gathering, Organization, and ParsingArithmetic and StatisticsExploratory Analysis and VisualizationMachine LearningGet Started with Go for Data ScienceApplying the Kappa Architecture to the Telco IndustryWhat Is Kappa Architecture?Building the Analytics PipelineIncorporating a Bayesian Model to Do Advanced AnalyticsConclusion
3. Intelligent Real-Time Applications
The World Beyond Batch StreamingStreaming 102Extend Structured Streaming for Spark MLSemi-Supervised, Unsupervised, and Adaptive Algorithms for Large-Scale Time SeriesSurfacing AnomaliesAdaptive, Online, Ensupervised Algorithms at ScaleDiscovering Relationships Among KPIs and Semi-Supervised LearningRelated Resources:Uber’s Case for Incremental Processing on HadoopNear-Real-Time Use CasesIncremental Processing via “Mini” BatchesChallenges of Incremental ProcessingTakeaways
4. Cloud Infrastructure
Where Should You Manage a Cloud-Based Hadoop Cluster?High-Level DifferentiatorsCloud Ecosystem IntegrationBig Data Is More Than Just HadoopKey TakeawaysSpark Comparison: AWS Versus GCPSubmitting Spark Jobs to the CloudConfiguring Cloud ServicesYou Get What You Pay ForPerformance ComparisonConclusionTime-Series Analysis on Cloud Infrastructure MetricsInfrastructure Usage DataScheduled Auto ScalingDynamic Auto ScalingAssess Cost Savings First
5. Machine Learning: Models and Training
What Is Hardcore Data Science—in Practice?Computing RecommendationsBringing Mathematical Approaches into IndustryUnderstanding Data Science Versus ProductionWhy Start Small?Distinguishing a Production System from Data ScienceData Scientists and Developers: Modes of CollaborationConstantly Adapt and ImproveTraining and Serving NLP Models Using Spark MLlibConstructing Predictive Models with SparkThe Process of Building a Machine-Learning ProductOperationalizationSpark’s RoleFitting It Into Our Existing Platform with IdiMLFaster, Flexible Performant SystemsThree Ideas to Add to Your Data Science ToolkitUse a Reusable Holdout Method to Avoid Overfitting During Interactive Data AnalysisUse Random Search for Black-Box Parameter TuningExplain Your Black-Box Models Using Local ApproximationsRelated ResourcesIntroduction to Local Interpretable Model-Agnostic Explanations (LIME)Intuition Behind LIMEExamplesConclusion
6. Deep Learning and AI
The Current State of Machine Intelligence 3.0Ready Player WorldWhy Even Bot-Her?On to 11111000001Peter Pan’s Never-Never LandInspirational Machine IntelligenceLooking ForwardHello, TensorFlow!Names and Execution in Python and TensorFlowThe Simplest TensorFlow GraphThe Simplest TensorFlow NeuronSee Your Graph in TensorBoardMaking the Neuron LearnFlowing OnwardCompressing and Regularizing Deep Neural NetworksCurrent Training Methods Are InadequateDeep CompressionDSD TrainingGenerating Image DescriptionsAdvantages of Sparsity

Overview

Now in its sixth edition, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve examined throughout 2016. This collection of blog posts, authored by leading thinkers and experts in the field, reflects a unique set of themes we’ve identified as gaining significant attention and traction.

Our list of topics for 2016 includes:

Careers in data
Tools and architecture for big data
Intelligent real-time applications
Cloud infrastructure
Machine learning: models and training
Deep learning and artificial intelligence

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data Just Right: Introduction to Large-Scale Data & Analytics

Publisher Resources

ISBN: 9781492049197

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills