book

Data Science and Engineering at Enterprise Scale

Name: Data Science and Engineering at Enterprise Scale
Author: Jerome Nilmeier
ISBN: 9781492039334

by Jerome Nilmeier

April 2019

Beginner to intermediate

89 pages

1h 55m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
What This Book Will Cover and How It Will Help You with Your Daily WorkConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Sharing Information Across Disciplines in the Enterprise
The Overlap Between Data Scientist and Data EngineerHow Notebooks Bridge the GapNotebooks as a Medium of CommunicationExample: Validating Statistical Functions and Developing Unit TestsEvaluating a Validated Desktop-Scale FunctionUnderstanding the Logic to Be Used at ScaleGenerating a Unit Test with a Smaller SampleWriting the Scalable CodeSummary
2. Setting Up Your Notebook Environment
Quick Start with Watson StudioCreating a Project and Importing a NotebookSetting Up Your Own EnvironmentUsing Docker ImagesInstalling Apache Spark, TensorFlow, and NotebooksInstalling SparkInstalling JavaInstalling Spark BinaryCreating the Python EnvironmentInstalling JupyterInstalling Deep Learning FrameworksSummary
3. Data Science Technologies
Apache SparkSpark Core: Executors, Cluster Configurations, and MoreRDDs, Datasets, and DataFrames: How to Use ThemExample: Creating and Calulating with an RDDCaching ResultsSpark SQL and DataFramesSummary
4. Introduction to Machine Learning
Linear Regression as a Machine Learning ModelDefining the Loss FunctionSolving for ParametersA “trick” for linear models: The normal equationNumerical Optimization, the Workhorse of All Machine LearningFeature ScalingLetting the Libraries Do Their JobThe Data Scientist Has a Job to Do TooSummary
5. Classic Machine Learning Examples and Applications
Supervised Learning ModelsThe Activation Function: From a Value to a LabelUsing Labeled Data for Training Your ModelMaking Predictions with the Trained ModelEvaluating Model Performance and Deploying the ModelCollaborative FilteringUnderstanding the Model as a Latent Feature ModelUnsupervised Learning ModelsK Means ClusteringFrom Clusters to Topics: Text Analytics with Unsupervised LearningK Means Clustering Using Word2VecThe Latent Dirichlet AllocationSummary
6. Advanced Machine Learning Examples and Applications
Deep Learning Models with Spark and TensorFlowThe Neural NetworkTraining the Neural NetworkGraph AnalyticsWhat Is a Graph and Why Should We Care?Summary

Overview

As enterprise-scale data science sharpens its focus on data-driven decision making and machine learning, new tools have emerged to help facilitate these processes. This practical ebook shows data scientists and enterprise developers how the notebook interface, Apache Spark, and other collaboration tools are particularly well suited to bridge the communication gap between their teams.

Through a series of real-world examples, author Jerome Nilmeier demonstrates how to generate a model that enables data scientists and developers to share ideas and project code. You’ll learn how data scientists can approach real-world business problems with Spark and how developers can then implement the solution in a production environment.

Dive deep into data science technologies, including Spark, TensorFlow, and the Jupyter Notebook
Learn how Spark and Python notebooks enable data scientists and developers to work together
Explore how the notebook environment works with Spark SQL for structured data
Use notebooks and Spark as a launchpad to pursue supervised, unsupervised, and deep learning data models
Learn additional Spark functionality, including graph analysis and streaming
Explore the use of analytics in the production environment, particularly when creating data pipelines and deploying code

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492039341

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills