The advantages of open source in analytics

Trusted Analytics Platform, the open source analytics platform founded by Intel, helps companies access the newest tools and frameworks for data analytics.

By Nan Barber
April 26, 2016
Surface Surface (source: Pixabay)

Faced with the challenge of using big data to innovate in their fields and grow their businesses, companies are moving toward open-source-based data analytics tools. The Trusted Analytics Platform (TAP) builds on this fact, offering a platform that brings it all together into a cloud-based collaborative workflow. TAP combines existing components optimized to speed up the process of ingesting and analyzing data from multiple sources and formats. Making the platform open source gives companies the freedom to create a custom workbench and engages a community of developers who can work together to accelerate innovation.

In a previous interview, Rachel Roumeliotis, OSCON chair and O’Reilly strategic content director, spoke with Chuck Freedman, chief developer advocate in the Big Data Solutions group at Intel, about how TAP’s architecture is built and how it helps organizations build analytics capabilities into cloud-based applications.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

As a follow-up to the initial interview, Rachel talked to Freedman and Iman Saleh, Intel’s big data evangelist, about Intel’s aims for TAP as a data analytics platform in the context of the open source community. To help showcase leading contributions to the open source project, the TAP team is holding a contest. The winning code commits will be presented at OSCON (May 16-19, in Austin, Texas). The contest is described in the previous blog post and on the TAP team’s announcement.

The full conversation between Roumeliotis, Freedman, and Saleh is available in the embed below. Highlights from their discussion follow.

Note: This interview was edited and condensed for clarity.

Why would an enterprise go for an open source solution?

Freedman: Sometimes vendor software and platforms are supported well while in contract, but if the company wants to go its own way, the relationship changes a lot. Going with an open source platform allows the company to be independent from someone else’s proprietary software stack. It also allows them to be part of a community. What’s great about the open source movement, particularly in analytics, is that the developers in companies using open source platforms help advance the project by contributing to it, adding features as they need them, and collectively moving the platform forward.

Saleh: Open source has shown over the past few years that it’s the best way to push technology forward because it opens the door for innovation. It’s not only for one company to decide the best way to do things; it’s up to the community to decide.

How has Intel incorporated the flexibility of open source into TAP?

Saleh: We have all these amazing open source algorithms for machine learning. There is always a case where customers have their own library and machine learning algorithm, and they want to integrate it into the platform. The design of our analytics toolkit supports that in various ways. The library is extensible, and the customer can mix and match algorithms that we provide with algorithms they already have. We have the notion of plug-ins so they can implement their own plug-in around an existing algorithm and use it within TAP.

What does TAP do for the big data industry?

Freedman: For years Intel has been contributing to open source projects and platforms, like Linux and the Python framework in general. In recent years, it’s become a leading contributor in Apache Spark and Apache Hadoop, which are the cornerstones of today’s analytics workflow. We want to make sure that everyone’s realizing the benefits of accelerated analytics on top of the Intel chips that are found in most data centers.

TAP brings all the disparate pieces together into one cohesive platform where the benefits of Spark and Hadoop are realized when they’re running in data centers. That gets combined with other performances that Intel has contributed to, like Python as it’s risen to be a more prominent language in the data science base. Also, on the solutions side, developers will be able to take the data that data scientists work with in TAP and build valuable solutions either for their own enterprise or for their customers.

How is Intel building a community around TAP?

Saleh: We have a community website where developers can access documentation and step-by-step instructions, and get help installing and using the platform. The website also serves as an open source repository to which they can contribute. We work with developers very closely through meetups and online webinars to guide them through how to contribute to TAP. We’re having a contest around TAP as well.

Freedman: We’re really out there to embrace what the community does. If there’s a developer who’s made an outstanding contribution to TAP, we’re in a position to recognize them and make them prominent not just within our community, but in the overall analytics, data science, and developer communities.

This post is part of a collaboration between O’Reilly and Intel. See our statement of editorial independence. Intel’s contest is in no way connected with O’Reilly Media, Inc.

Post topics: Open Source