Breaking down data and technical silos

Specialized technical tools are great, but sometimes a general contractor is the best approach.

By Rachel Wolfson
July 19, 2016
Ceiling from Ellora, cave 10, teaching Buddha. Ceiling from Ellora, cave 10, teaching Buddha. (source: Arian Zwegers on Flickr)

Much has been made of the importance of “tearing down silos”—giving everyone in the organization broad access to data. This is, of course, hugely important. Establishing broad access and a governance structure that creates a “single source of truth” allowing effective cross-team communication, and healthy competition in analysis (disagreement about what the numbers mean, which can be resolved with experiment, is much better than disagreement about what the numbers are) builds the foundation of a data-driven business.

But there is another, more hidden, type of siloization to worry about as well: the “siloization” of data analytics tools. The past decade or so have seen the development of powerful tools that handle specific needs extremely well, so the “modern” data stack consists of a jigsaw puzzle of purpose-built tools, each of which does its own piece extremely well.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

This would be perfect, if humans were perfect. People, however, specialize and develop habits, too. And even small hurdles to using the best tool can nudge people to using an inferior one. We can look at the examples of Tableau and MicroStrategy for their exceptional support in their specific arenas—accessibility or data governance, respectively—but not for both elements. There is no question the years of development that have gone into their approaches have paid off.

Tableau provides end-users with the raw beauty of data visualizations and ease of use in making them, but lacks support for data governance, collaboration, and multiplatform accessibility. MicroStrategy primarily focuses on governance, maintenance, and scalability, yet it lacks the flexibility to respond to changing business needs and data sources.

While Tableau and MicroStrategy are working to broaden their capabilities, we can see an example of the integrated approach in platforms like Looker, which aim to provide the UI and governance needed for near real-time, self-service data discovery that both analysts and business teams can access together.

Defining the parts of the job is not the same as defining the whole job

To gain powerful business intelligence, organizations need tools to offer a lot more functionality. Analytical tools should be able to support today’s current needs, while accommodating future growth.  Additionally, companies need their front-end analytics platforms to have the flexibility to run in the cloud or on-premise, easy access to a wide range of data stores, a centralized common data model for various business teams, company wide governance/access to various back-end systems, and to operate using an industry standard language. These requirements need to exist together, in support of the overall goal of making data driven decisions—so, technical requirements for analytics tools (visualization, governance, access, scalability, future-proofing) should be interpreted through this lens.

The baseline, is, of course, access: ease of access to a wide range of data stores and rich SQL support for back-end engines is vital. Admins should be able to easily connect to any structured or unstructured data exposed over standard JDBC/ODBC, or any HTTP URL. Further, any solution should support any SQL-compatible database or query engine, including Spark, Hive, Tez, Amazon Redshift, and even Google’s BigQuery.

Support for Presto

One technology that should be mentioned in this context is Presto, the open source SQL query engine optimized for high-speed interactive analytics. Presto is used by companies such as Facebook, Airbnb, and Dropbox. Teradata also recently announced a multi-year commitment to contribute to Presto’s open source development.

Presto federates access to data across many—and varied—sources, enabling not just de-siloization, but fast access to data. 

Dillon Morrison, product manager at Looker, elaborated in an interview: “Looker’s integration with Presto is significant, not only because it allows Looker to utilize one of the most powerful querying engines on the market, but also because it reflects Looker’s ability to respond to and leverage new advancements in data technology.”

Shared modeling language

A shared modeling language enables effective communication about data. Admins need to make high-level business decisions about the significance and relationships among data sources, lest individuals and smaller teams develop ad-hoc approaches that can result in miscommunication and error. Following up with the example from Looker, their modeling language, LookML, provides admins with the ability to either perform point-and-click code generation, or to drop down and edit/override the generated code to gain access when necessary. This language is also particularly friendly for those familiar with SQL, yet LookML is easy to experiment with, to extend and customize individual queries (making it easy to use for those unfamiliar with SQL as well). 

“Unlike some proprietary analytics languages, LookML doesn’t try to reinvent the wheel,” Morrison explained. “It’s written by people who know and love SQL, so LookML aims to keep all the power and flexibility of SQL, but to smooth off some of its rough edges. It simplifies the process of conducting complex analysis by compiling modular pieces of SQL through an intuitive drag-and-drop interface. Complex functions like pivots and even simple formatting are all available based on the definitions in LookML. For admins, it provides a centralized governance layer to help ensure end users access the same metrics.”

Making sense of big data analytics to gain insights into business intelligence (BI) is a dream come true for modern enterprises. After all, understanding data that drives business can boost sales, increase ROI, and provide answers to even the toughest questions. Tools that enable broad access and clear, effective communication about data are the basis for real data-driven decision-making—thinking holistically about these tools is key to avoiding the development of analytical silos in the course of taking down historical silos.

This post is a collaboration between O’Reilly Media and Looker. View our statement of editorial independence.

Post topics: Data science