Chapter 8. The Future of Data Virtualization

In this chapter we discuss areas of active research and technology development in modern data virtualization systems. Although there are many different areas of ongoing research, this chapter will focus on two: hybrid push-pull systems and data lakehouses and icehouses.

Hybrid Push-Pull Systems

As we have discussed in previous chapters, data virtualization systems typically provide a unified interface to the end user with a listing of all the datasets available in the underlying data sources that the system is connected to. Upon receiving a query, the DV System generates subqueries expressed over the underlying data sources that contain data relevant to the submitted query, consumes the results, and performs any further processing required to complete the query. The simplest form of a subquery could be a simple scan operation to extract a raw dataset and ship it to the DV System; alternatively, more advanced queries can be submitted to the underlying systems in order to push more of the query processing effort closer to the data sources and potentially reduce the amount of data that needs to be shipped over the network. We described in Chapters 2 and 3 that push-based DV Systems typically push more work down to the underlying systems, while pull-based DV Systems pull data into the system to perform processing there.

There are clear advantages to pushing more query processing work down to the underlying systems and closer to where ...

Get Data Virtualization in the Cloud Era now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Virtualization in the Cloud Era by Daniel Abadi, Andrew Mott

Chapter 8. The Future of Data Virtualization

Hybrid Push-Pull Systems

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly