Human-guided ML pipelines for data unification and cleaning might be the only way to provide complete and trustworthy data sets for effective analytics.
Using a single cloud provider is a thing of the past.
Practical questions to help you make a decision.
Tamr’s Eliot Knudsen on algorithms that work alongside human experts.
A multi-model approach to transforming data from a liability to an asset.
A framework for moving from data to wisdom.
Authors Julia Silge and David Robinson discuss the power of tidy data principles, sentiment lexicons, and what they're up to at Stack Overflow.
Recapping winners of the Strata San Jose Startup Showcase.
Stewart Rogers on building and managing products with embedded analytics.
A new architecture for today’s data-rich modern applications.
Integrate and access any form of data using a multi-model database.
Exploring a reference architecture solution.
Overcome three types of debt to ship quality machine learning code.
A new role focused on creating data products and making data science work in production.
The O'Reilly Podcast: Ken Krupa on the challenge of data integration, and a solution.
Nothing says machine learning can't outperform humans, but it's important to realize perfect machine learning doesn't, and won't, exist.
Bas Geerdink details the technology stack for real-time account forecasting at ING, and outlines how Spark is used for outbound communications.
Access to critical data in real time enables workers to generate insights from large amounts of information.
Metadata is central to a modern data architecture.
A possible solution to the complexities that plague big data projects.
June Andrews talks about simple, cost-effective algorithmic computing at scale.
Kurt Brown discusses services in use, such as Genie, Metacat, Charlotte, and Microbots.
There’s money to be made in exhaust data (not just data exhaust).
Scientific use cases show promise, but challenges remain for complex data analytics.
Andra Keay discusses the five laws of robotics design.
Michael Jordan on developing a new platform to support real-time decision-making.
O'Reilly Podcast: Ian Fyfe of Zoomdata on the importance of “speed-of-thought analysis” in modern data environments.
Tips and tools for data janitors.
The present and future of data integration in the cloud.
Transform the way you approach analytics.
Mix-and-match approaches for visualizing data and interpreting machine learning models and results.
How we created an illustrated guide to help you find your way through the data landscape.
Flash flood prediction using machine learning has proven capable in the U.S. and Europe; we're now bringing it to East Africa.
An interview with Greg Meddles, technical lead for healthcare.gov.
The better prepared you are to utilize all the data in your data lake, the more likely you are to be successful.
Validating your data requires asking the right questions and using the right data.
A peek into the clickstream analysis and production pipeline for processing tens of millions of daily clicks, for thousands of articles.
What data scientists need to know about production—and what production should expect from their data scientists.
Best practices and scalable workflows for reproducible data science.
Putting deep learning into practice with new tools, frameworks, and future developments.
Drew Paroski and Gary Orenstein on the rapid spread of machine learning and predictive analytics
How bots, threat intelligence, adversarial machine learning, and deep learning are impacting the security landscape.
Evaluating the state and development of Scala from a data engineering perspective.
The telecommunication industry’s unique position for new revenue opportunities in big data, IoT, and VR
Telcos must regain value from over-the-top services and develop new sources of revenue by leveraging their data and infrastructure.
The O’Reilly Podcast: John Thuma on how businesses can get more than “what happened” from their data.
The O’Reilly Podcast: Bob Montemurro on planning data systems to match needs.
Technical and policy considerations in combatting algorithmic bias.
Learning to act based on long-term payoffs.
Rather than hiring data scientists from outside, consider training your proto data scientists.
It's important in this age of big data to return the original meaning of serendipity and talk about it as a skill.
Deeper neural nets often yield harder optimization problems.
O'Reilly Podcast: Working with databases that go beyond traditional models.
A look at the data pipeline architecture for five key NERSC projects.
Close the time gap between analysis and action to bring about the next wave of improvements in efficiency and reliability—and magic.
This report explores how political data science helps to drive everything from overall strategy and messaging to individual voter contacts and advertising.
Why cross-channel analytics are crucial to empowering business teams with a behavioral view of your customer.
Start planning now to reap the many benefits of connected manufacturing.
The anatomy of an architecture to bring data science into production.
Analytic Ops—DevOps for data science—makes data analysis into a continually evolving process to meet business needs.
Rohit Jain takes an in-depth look at the possibilities and the challenges for companies that long for a single query engine to rule them all.