If you’re an IT professional, software engineer, or software product manager, over the past few years, you’ve likely considered using modern data platforms such as Apache Hadoop; NoSQL databases like MongoDB, Cassandra, and Kudu; search databases like Solr and Elasticsearch; in-memory systems like Spark and MemSQL; and cloud data stores such as Amazon Redshift, Google BigQuery, and Snowflake. But are these modern data technologies here to stay, or are they a flash-in-the-pan with the traditional relational database still reigning supreme?
In the Spring of 2017, Zoomdata commissioned O’Reilly Media to create and execute a survey assessing the state of the data and analytics industry. The focus was on understanding the penetration of modern big and streaming data technologies, how data analytics are being consumed by users, and what skills organizations are most interested in staffing. Nearly 900 people from a diverse set of industries, as well as government and academia, responded to the survey. Below is a preview of some of the insights provided by the survey.
Modern data platforms have eclipsed relational databases as a main data source
Of course, relational databases continue to be the core of online transactional processing (OLTP) systems. However, one of the most interesting findings was that when asked about their organization’s main data sources, less than one-third of survey respondents listed the relational database, with around two-thirds selecting non-relational sources. This is a clear indication that these non-relational data platforms have firmly crossed the chasm from early adopters into mainstream use.
Of further interest is the fact that just over 40% of respondents indicated their organizations are using what could be categorized as “modern data sources” such as Hadoop, in-memory, NoSQL, and search databases as a main data source. These modern data sources are optimized to handle what is often referred to as the “three V’s” of big data: very large data volumes; high velocity streaming data; and high variety of unstructured and semi-structured data, such as text and log files.
Drilling further into the details, analytic databases (19%) and Hadoop (14%) were the two most popular non-relational sources. Analytic databases are a category of SQL-based data stores such as Teradata, Vertica, and MemSQL that typically make use of column-store and/or massively parallel processing (MPP) to greatly speed up the kinds of large aggregate queries used when analyzing data. Hadoop, as many readers know, is a software framework used for distributed storage and processing of very large structured and unstructured data sets on computer clusters built from commodity hardware.
Download the full report to learn about other findings we uncovered in this survey, including:
- The proportion of organizations with big data projects in production and under development
- How important different levels of data freshness is to organizations
- The most popular streaming data platforms
- The leading technical skills for which organizations are staffing
- Whether organizations are consuming analytics via standalone BI apps or as analytic components embedded into other business applications and processes
This post is part of a collaboration between O'Reilly and Zoomdata. See our statement of editorial independence.