Press Release: January 29, 2019
New O’Reilly Report Explores Tools and Best Practices for Advanced Analytics and Artificial Intelligence
Boston, MA—O’Reilly, the premier source for insight-driven learning on technology and business, today announced the results of its “Evolving Data Infrastructure” survey, which explores the tools companies are using for their advanced analytics and Artificial Intelligence (AI) projects—and the best practices they have acquired along the way.
The research, which will be released in full at O’Reilly’s upcoming Strata Data Conference in San Francisco, found that more than half (58 percent) of today’s companies are either building or evaluating data science platforms—which are essential for companies that are keen on growing their data science teams and machine learning capabilities—while 85 percent of companies already have data infrastructure in the cloud.
Some of the key other findings from the research include:
- Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and Extract, Transform and Load (ETL) (60 percent), data preparation and cleaning (52 percent), data governance (31 percent), metadata analysis and management (28 percent) and data lineage management (21 percent).
- Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven top cloud providers, with two-thirds (63 percent) using Amazon Web Services (AWS). The results also showed that users of AWS, Microsoft Azure or Google Cloud Platform (GCP) tended to use multiple cloud providers.
- The use of durable cloud storage is prevalent. Sixty-two percent of all respondents indicated they used at least one of the following: Amazon S3 or Glacier, Azure Storage, or Google Cloud Storage.
- Data scientists and data engineers are in demand. When asked what skills their teams needed to strengthen, 44 percent said data science and 41 percent said data engineering.
- Respondents used a variety of streaming and data processing technologies. Half of the respondents (49 percent) used either Apache Spark or Spark Streaming, while other popular tools included open source projects (Apache Kafka, Apache Hadoop) and their related managed services in the cloud (Elastic MapReduce, AWS Kinesis).
- Business intelligence uses a mix of open source and managed services. When it comes to SQL, respondents favored open source tools (Spark SQL, Apache Hive) and managed services in the cloud (AWS RedShift, Google BigQuery).
- Although a majority (60 percent) aren’t using serverless technologies, one-third (30 percent) are already using AWS Lambda. In fact, 38 percent indicated that they were using at least one serverless technology—a pattern that remained consistent across geographic regions.
- “It is clear that in 2019 companies are planning to invest in implementing analytics, AI and automation tools,” said Ben Lorica, O’Reilly’s chief data scientist and chair of the Strata Data Conference. “However, in order to do so successfully, initial investments must be made in the foundational technologies and infrastructure needed to sustain success. Our research shows that a majority of companies understand this and are already building—or at the very least evaluating—platform solutions and tools to make this possible.”
For more information and to register to download a copy of the report, please visit: https://www.oreilly.com/data/free/evolving-data-infrastructure.csp.
O’Reilly will present the full research findings at the upcoming Strata Data Conference, taking place March 25–28, 2019 in San Francisco at the Moscone Center. This year’s conference will bring together cutting-edge science and new business fundamentals to help attendees build a solid foundation for their AI strategy and machine learning initiatives. The event programming offers a deep dive into emerging data science techniques and technologies, including case studies, in-depth tutorials and emerging best practices.