Why model development does not equal software development.
Considerations based on experience with Fortune 500 clients.
Christine Foster discusses how today’s academic papers turn into tomorrow’s data science.
Having worked in both research and industry, Mikio Braun shares insights into what's the same, what's different, and how deep learning might change the game.
Zubin Siganporia explains how the KISS principle (“Keep It Simple, Stupid”) applies to solving problems and convincing end-users to adopt data-driven solutions to their challenges.
Martha Lane Fox considers the unintended consequences of technology.
Louise Beaumont explores the five characteristics of companies that choose to succeed.
One of our goals is to bring Jupyter’s enterprise use cases and practices into one place.
The O’Reilly Data Show Podcast: A special episode to mark the 100th episode.
Jean-François Puget explains why human context should be embraced as a guide to building better and smarter systems.
Eva Kaili outlines the fundamentals of GDPR and applications of blockchain.
May 25 is an important day for data protection in the EU and elsewhere. Alison Howard explains how Microsoft has prepared for May 25 and beyond.
Pierre Romera explores the challenges in making 1.4 TB of data securely available to journalists all over the world.
Ben Lorica looks at the problems we’re facing as we collect and store data, particularly when our machine learning models require huge amounts of labeled data.
Mick Hollison, Sven Löffler, and Robert Neumann explain how Deutsche Telekom is harnessing machine learning and analytics in the cloud to build Europe’s largest IoT data marketplace.
Watch highlights covering machine learning, GDPR, data protection, and more. From the Strata Data Conference in London 2018.
Answers to the three most commonly asked questions about maintaining GDPR-compliant machine learning programs.
The O’Reilly Data Show Podcast: Jason Dai on the first year of BigDL and AI in China.
Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products.
The O’Reilly Data Show Podcast: Jerry Overton on organizing data teams, agile experimentation, and the importance of ethics in data science.
Both reproducible science and open source are necessary for collaboration at scale—the nexus for that intermingling is Jupyter.
Learn how Spark 2.3.0+ integrates with K8s clusters on Google Cloud and Azure.
A failed analytics startup post-mortem.
Discover how data-driven organizations are using Jupyter to analyze data, share insights, and foster practices for dynamic, reproducible data science.
The O’Reilly Data Show Podcast: Guillaume Chaslot on bias and extremism in content recommendations.
The two positions are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity.
In an era where fake news travels faster than the truth, our communities are at a critical juncture.
Strata Data London will introduce technologies and techniques; showcase use cases; and highlight the importance of ethics, privacy, and security.
The O’Reilly Data Show Podcast: Jesse Anderson and Paco Nathan on organizing data teams and next-generation messaging with Apache Pulsar.
A deep dive into model interpretation as a theoretical concept and a high-level overview of Skater.
Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.
The O’Reilly Data Show Podcast: Ameet Talwalkar on large-scale machine learning.
Ajey Gore explains why GO-JEK is focusing its attention beyond urban Indonesia to help people across the country’s rural areas.
Using silly data sets as examples, Janelle Shane talks about ways that algorithms fail.
Eric Colson explains why companies must now think very differently about the role and placement of data science in organizations.
William Vambenepe walks through an interesting use case of machine learning in action and discusses the central role AI will play in big data analysis moving forward.
Seth Stephens-Davidowitz explains how to use Google searches to uncover behaviors or attitudes that may be hidden from traditional surveys.
Anoop Dawar shares principles successful companies are using to inspire an insight-driven ethos and build data-competent organizations.
How to find promising candidates for upskilling within your organization.
Natalie Evans Harris discusses the Community Principles on Ethical Data Practices (CPEDP), a code of ethics for data collection, sharing, and utilization.
Tobias Ternstrom explains why you should objectively evaluate the problem you're trying to solve before choosing the tool to fix it.
Watch highlights covering machine learning, business intelligence, data privacy, and more. From the Strata Data Conference in San Jose 2018.
Ben Lorica explores emerging security best practices for business intelligence, machine learning, and mobile computing products.
Nancy Lublin and Bob Filbin explore findings from crisis data.
Li Fan shows how Pinterest is using AI to predict what’s in an image, what a user wants, and what they’ll want next.
Dinesh Nirmal explains how real-world machine learning reveals assumptions embedded in business processes that cause expensive misunderstandings.
Alex Smola shares lessons learned from AWS SageMaker, an integrated framework for handling all stages of analysis.
The O’Reilly Data Show Podcast: Ofer Ronen on the current state of chatbots.
A product manager's guide to employing data as a feature.
The O’Reilly Data Show Podcast: Danny Lange on how reinforcement learning can accelerate software development and how it can be democratized.
A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations.
A step-by-step guide to building and running a natural language processing pipeline.
A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy.
A look at the new streaming SQL engine for Apache Kafka.
Attend a day-long exploration of Jupyter's best practices and practical use cases in business and industry.
Ingest the data you need in an agile manner.
The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations.
Alysa Hutnik discusses the Fair Credit Reporting Act, the Equal Credit Opportunity Act, the Gramm-Leach Bliley Act, and the FTC’s focus on FinTech.
How companies such as athenahealth can transform legacy data into insights.
Gain agility by loading first and transforming later.