Data science – O’Reilly

Core technologies and tools for AI, big data, and cloud computing

By Ben Lorica

Highlights and use cases from companies that are building the technologies needed to sustain their use of analytics and machine learning.

Deep automation in machine learning

By Ben Lorica and Mike Loukides

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline.

Handling real-time data operations in the enterprise

By Jesse Anderson

Getting DataOps right is crucial to your late-stage big data projects.

It’s time to establish big data standards

By Rohit Jain

The deployment of big data tools is being held back by the lack of standards in a number of growth areas.

Case studies in data ethics

By Mike Loukides, Hilary Mason and DJ Patil

These studies provide a foundation for discussing ethical issues so we can better integrate data ethics in real life.

Data collection and data markets in the age of privacy and machine learning

By Ben Lorica

While models and algorithms garner most of the media coverage, this is a great time to be thinking about building tools in data.

Testing machine learning explanation techniques

By Patrick Hall, Navdeep Gill and Lingyao Meng

The importance of testing your tools, using multiple tools, and seeking consistency across various interpretability techniques.

5 key drivers for getting more value from your data

By Michael Li and Matt Maccaux

Considerations based on experience with Fortune 500 clients.

How to build analytic products in an age when data privacy has become critical

By Ben Lorica

Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products.

How to run a custom version of Spark on hosted Kubernetes

By Holden Karau and Alena Hall

Learn how Spark 2.3.0+ integrates with K8s clusters on Google Cloud and Azure.

How companies around the world apply machine learning

By Ben Lorica

Strata Data London will introduce technologies and techniques; showcase use cases; and highlight the importance of ethics, privacy, and security.

It’s time for data ethics conversations at your dinner table

By Lucy C. Erickson, Natalie Evans Harris and Meredith M. Lee

In an era where fake news travels faster than the truth, our communities are at a critical juncture.

Interpreting predictive models with Skater: Unboxing model opacity

By Pramit Choudhary

A deep dive into model interpretation as a theoretical concept and a high-level overview of Skater.

Data governance and the death of schema on read

By Barbara Eckman

Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.

Identifying budding big data talent in your company

By Michael Li

How to find promising candidates for upskilling within your organization.

Comparing production-grade NLP libraries: Accuracy, performance, and scalability

By Saif Addin Ellafi

A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations.

Comparing production-grade NLP libraries: Running Spark-NLP and spaCy pipelines

By Saif Addin Ellafi

A step-by-step guide to building and running a natural language processing pipeline.

Comparing production-grade NLP libraries: Training Spark-NLP and spaCy pipelines

By Saif Addin Ellafi

A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy.

Big, fast, easy data with KSQL

By Michael Noll

A look at the new streaming SQL engine for Apache Kafka.

Rapid data production with a multi-model database

By Joel Ruisi

Ingest the data you need in an agile manner.

4 trends in security data science for 2018

By Ram Shankar Siva Kumar and Hyrum Anderson

A glimpse into what lies ahead for response automation, model compliance, and repeatable experiments.

When two trends fuse: PyTorch and recommender systems

By Mo Patel

A look at the rise of the deep learning library PyTorch and simultaneous advancements in recommender systems.

How self-service data avoids the dangers of “shadow analytics”

By Kelly Stirman

Without the proper cataloging, curation, and security that self-service data platforms allow, companies are left vulnerable to cybersecurity threats and misinformation.

What it means to be data-driven in the media industry

By Nicole Tache

O’Reilly Media Podcast: David Hsieh, of Qubole, in conversation with John Slocum, of MediaMath.