Best of Strata + Hadoop World 2012: Analyzing Millions of GitHub Commits
What Makes Developers Happy, Angry, and Everything in Between?
Date: This event took place live on August 08 2013
Presented by: Ilya Grigorik
Duration: Approximately 60 minutes.
Questions? Please send email to
Join us for an exclusive presentation by Ilya Grigorik recorded live at Strata + Hadoop World 2012
Open source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing and discussing bug reports, updating documentation and project wikis, and so forth. The data generated from this activity can reveal interesting trends across many industries, including popularity of programming languages over time, defect rates, contribution metrics, and popularity of specific frameworks and libraries.
To help us extract the insights from the public GitHub timeline which generated hundreds of thousands of daily events, we imported the entire dataset into Google BigQuery. This makes data about tens of millions of open source commits and discussions accessible to the world for quick interactive analysis. With that, we can run our analysis:
In this session, we will answer the above questions and much more. We will also discuss our experience in using BigQuery, how we modeled the GitHub event data, and the lessons learned in importing and making the data available.
About Ilya Grigorik
Ilya Grigorik is a web peformance engineer and advocate at Google, an open-source evangelist, an analytics geek, and a proverbial early adopter of all things digital. Prior to focusing on web performance Ilya was the founder and CTO of PostRank, a social analytics company which became the core of social analytics within Google Analytics.
You may also be interested in: