Getting DataOps right is crucial to your late-stage big data projects.
If we’re going to think about the ethics of data and how it’s used, then we have to take into account how data flows.
The deployment of big data tools is being held back by the lack of standards in a number of growth areas.
New survey results highlight the ways organizations are handling machine learning's move to the mainstream.
These studies provide a foundation for discussing ethical issues so we can better integrate data ethics in real life.
We can build a future we want to live in, or we can build a nightmare. The choice is up to us.
Five framing guidelines to help you think about building data products.
Recognizing the interest in ML, the Strata Data Conference program is designed to help companies adopt ML across large sections of their existing operations.
While models and algorithms garner most of the media coverage, this is a great time to be thinking about building tools in data.
Oaths have their value, but checklists will help put principles into practice.
Data scientists, data engineers, AI and ML developers, and other data professionals need to live ethical values, not just talk about them.
The importance of testing your tools, using multiple tools, and seeking consistency across various interpretability techniques.
Why model development does not equal software development.
Considerations based on experience with Fortune 500 clients.
Answers to the three most commonly asked questions about maintaining GDPR-compliant machine learning programs.
Privacy-preserving analytics is not only possible, but with GDPR about to come online, it will become necessary to incorporate privacy in your data products.
Learn how Spark 2.3.0+ integrates with K8s clusters on Google Cloud and Azure.
The two positions are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity.
Strata Data London will introduce technologies and techniques; showcase use cases; and highlight the importance of ethics, privacy, and security.
In an era where fake news travels faster than the truth, our communities are at a critical juncture.
A deep dive into model interpretation as a theoretical concept and a high-level overview of Skater.
Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.
How to find promising candidates for upskilling within your organization.
A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations.
A step-by-step guide to building and running a natural language processing pipeline.
A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy.
A look at the new streaming SQL engine for Apache Kafka.
Ingest the data you need in an agile manner.
A glimpse into what lies ahead for response automation, model compliance, and repeatable experiments.
Decoding simple regex features to match complex text patterns.
A look at the rise of the deep learning library PyTorch and simultaneous advancements in recommender systems.
Without the proper cataloging, curation, and security that self-service data platforms allow, companies are left vulnerable to cybersecurity threats and misinformation.
O’Reilly Media Podcast: David Hsieh, of Qubole, in conversation with John Slocum, of MediaMath.
A survey of usage, access methods, projects, and skills.
Drawing parallels and distinctions around neural networks, data sets, and hardware.
Analyzing tweets and posts around Trump, Russia, and the NFL using information entropy, network analysis, and community detection algorithms.
Reduce troubleshooting time from days to seconds.
The convergence of big data, artificial intelligence, and business intelligence
Solving challenges of data analytics to make data accessible to all.
Fast data and virtualization are shifting the way telcos approach the IoT.
The right AI solution is the one that fits the skill set of the users and solves the highest-priority problems for the business.
To become a “machine learning company,” you need tools and processes to overcome challenges in data, engineering, and models.
The O'Reilly Podcast: Han Yang on the importance of investment, innovation, and improvisation.
Applying methods from Agile software development to data science projects.
Untangling data pipelines with a streaming platform.
Become more agile with business intelligence and data analytics.
How human-in-the-loop data analytics is accelerating the discovery of insights.
The O’Reilly Podcast: Achieving greater reliability and security when integrating data.
The O'Reilly Podcast: Gary Orenstein on developing a data infrastructure that enables the latest applications in machine learning and AI.
Utilizing GPU power to improve performance and agility.
A deep dive into Uber's engineering effort to optimize geospatial queries in Presto.
The O'Reilly Podcast: Dave Cassel on building a unified enterprise database to store and query any type of data.
6 lessons learned to get a quick start on productivity.
A look at the Layer API, TFLearn, and Keras.
Building a production-grade real-time image classification system.
Applications of CNNs for real-time image classification in the enterprise.
Why machine learning needs real-time data infrastructure.
Recent trends in practical use and a discussion of key bottlenecks in supervised machine learning.
The toughest part of machine learning with Spark isn't what you think it is.
Human-guided ML pipelines for data unification and cleaning might be the only way to provide complete and trustworthy data sets for effective analytics.