Editor’s note: full disclosure — Ben is an advisor to Databricks.

I am pleased to announce a joint program between O’Reilly and Databricks to certify Spark developers. O’Reilly has long been interested in certification, and with this inaugural program, we believe we have the right combination — an ascendant framework and a partnership with the team behind the technology. The founding team of Databricks comprises members of the UC Berkeley AMPLab team that created Spark.

The certification exam will be offered at Strata events, through Databricks’ Spark Summits, and at training workshops run by Databricks and its partner companies. A variety of O’Reilly resources will accompany the certification program, including books, training days, and videos targeted at developers and companies interested in the Apache Spark ecosystem.

Offering certification in Spark reinforces O’Reilly’s commitment to help companies and developers keep pace with the latest innovations in the big data space. Over the past 18 months, Apache Spark has become the most active open source project in big data. Through June 2014, there were more than 300 contributors from more than 50 companies.

spark-commits
Contributions to Apache Spark vs. other open source projects in the last six months. Source: Matei Zaharia, June 2014.

Spark-related proposals for our Strata conferences have surged — Spark was a trending topic among submissions for the upcoming NYC and Barcelona events. These speaking proposals come from companies already using Spark in production and who are using Spark to solve fundamental problems. Interest in Spark Camp (a training day at Strata in NYC and Barcelona) has been strong, and we plan to offer Spark Camp at future Strata events as well.

Many of the companies I interact with are using components of the Spark ecosystem (some companies have built their entire “data stack” out of these components). In the second half of this year, the different Hadoop distributions and vendors rallied to make Spark the “standard processing engine for big data.” At the most recent Spark Summit, Databricks demonstrated how complex data applications can easily be built on top of Spark.

When I first started using Spark a few years ago, one still had to learn Scala, and the analytic libraries were still fairly limited. From the outset, I was attracted to its speed, scalability, and the fact that I could use the same programming model for a variety of problems (batch, real time, interactive, iterative). Today, the Apache Spark ecosystem has a much richer set of libraries for machine learning, graph analytics, and interactive (SQL) analysis. APIs in Python and Java have significantly broadened its user base (I have met many avid users of PySpark).

The Apache Spark ecosystem continues to grow — new libraries are announced in every release. There are now Spark meetups in cities in the U.S., Europe, and Asia. Through our publishing program and this new certification, we hope to help nurture current and future users and contributors to Spark.

If you want to learn more, or if you wish to signup for updates, please visit our Spark Certification page.