AI & ML Business Data Innovation Research Security

Try the O’Reilly learning platform

With the O’Reilly learning platform, you get the resources and guidance to keep your skills sharp and stay ahead. Try it free for up to 14 days.

Start trial

Try a course for free

Join a live online event on the O’Reilly platform to learn from the experts shaping tech.

See what’s coming soon

Get the Radar Trends newsletter

Your email

Country

Please read our privacy policy.

Radar > Topics > AI & ML

Hybrid transactional/analytic systems and the quest for database nirvana

The O’Reilly Data Show Podcast: Rohit Jain on the challenges of hybrid data management systems.

By Ben Lorica June 16, 2016 • 00:37:08 listen

LinkedIn X Facebook Threads Bluesky Reddit

O'Reilly Data Show Podcast

Hybrid transactional/analytic systems and the quest for database nirvana

00:00 / 00:37:08

In this episode of the O’Reilly Data Show, I spoke with data management industry veteran Rohit Jain, currently the CTO of Esgyn. We talked about his years at HP Labs, and his recent project to bring hybrid transactional/analytic technologies into the Hadoop ecosystem.

Here are some highlights from our conversation:

SQL to NoSQL to NewSQL

I think if you look at proprietary systems, you have had work mostly focused on OLTP and on operational workloads. But when people got into data warehousing, they changed the architecture and came out with MPPs, mostly focused on BI and analytics.
As you start looking at the workloads that are running now even on Hadoop, you are seeing people demanding more and more operational and real-time types of responses from the database. … Some of the NoSQL implementations have been designed to provide these operational-type workloads to service those goals, except that they did it without SQL and without transactional support in some cases, and with a different data model. What people are realizing is that now they can leverage SQL, and now that the SQL companies have learned the lessons about what needs to be supported from a NoSQL perspective, we are seeing the blending of relational abstraction as well as providing the semi-structured and unstructured data integration.
… I think that SQL could still form a pretty nice query engine on the top of these different storage engines that are being provided now.

Query and storage engines in hybrid systems

In the past, proprietary databases provided and did everything. They had query and storage engines. Except for MySQL, which had this concept of a query engine and you could plug in different storage engines on the backend. Now what has happened is that you’ve got these different table formats, column stores, search and graph databases, and so forth. These are different structures, but actually since they reside in HDFS, in effect, they are acting like storage engines. The query engine is essentially allowing clients to connect and submit queries, and it allows them to distribute these connections across the cluster. It compiles the query, it executes the query, and it returns the results. That seems pretty simple, but of course, that’s where the optimizer is, that’s where the query plan is optimized, and then we come up with a really good execution engine to be able to execute that. That’s a pretty important piece that brings it all together.
… In a hybrid transactional/analytic system, the storage engine then has to provide a lot of other capabilities, such as the storage structures and the partitioning and the automatic rebalancing of those partitions and all that. It also has transactional support that the query engine has to leverage. There is compression, encryption, backup, restore—all the things that you expect in an enterprise-type deployment for disaster recovery. The storage engine is providing some capabilities; the query engine has to provide other capabilities. And there has to be an integration between the two in order to provide the capabilities from the operational to the analytic side as well as enterprise-type capabilities if you are going to deploy it in production.

Hybrid transactional/analytic systems and open source in China

We essentially have two sister companies based in China and Milpitas and we have our customers [in China], and certainly it’s taking off. It mimics a lot of the things that we’re doing in the U.S. … You’ve got companies like Alibaba—all these companies are doing a lot of these operational-type things, so their focus is in trying to do operational workloads. There is emphasis from the government to really look into open source and move in that direction. There is a lot of interest now in open source technologies because of that.

Related resources:

In search of database nirvana: Rohit Jain’s presentation at Strata San Jose
Resolving transactional access and analytic performance trade-offs
Specialized and hybrid data management and processing engines
“In Search of Database Nirvana: The Challenges of Delivering HTAP” (free report)

Post topics: AI & ML•Data•O'Reilly Data Show

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Try the O’Reilly learning platform

Try a course for free

Get the Radar Trends newsletter

Thank you for subscribing to the O’Reilly Radar Trends to Watch newsletter.

Hybrid transactional/analytic systems and the quest for database nirvana

SQL to NoSQL to NewSQL

Query and storage engines in hybrid systems

Hybrid transactional/analytic systems and open source in China