Steve SarsfieldSunil Venkayala

Understanding SQL on Hadoop and Distributed R

Date: This event took place live on March 10 2015

Presented by: Steve Sarsfield, Sunil Venkayala

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


R and SQL are powerful and well-established tools for performing analytics, with long investment of both human capital and development behind them, but both have limitations on scalability in a Hadoop/big data environment. The demands of today's market for predictive analytics—and the size of the data to be analyzed—are outpacing the development of software and skills to handle data. One solution to this problem is the development of tools that expand the capabilities of R and SQL.

In this webcast, you'll learn:

  • Leveraging multiple nodes for predictive analytics to vastly improve performance
  • Performing R analysis on larger data sets while overcoming scalability limitations of R
  • Using HP Haven on Hadoop capabilities to perform advanced SQL-based analytics on Hadoop data

About Steve Sarsfield

Steve Sarsfield is an author and expert in data quality and data governance. His book "The Data Governance Imperative" is a comprehensive exploration of data governance from the business perspective. Steve draws practical wisdom and inspiration from his colleagues at HP and its customers as they venture into their own data analytics projects.

About Sunil Venkayala

Sunil Venkayala, Senior Technical Product Manager at HP in Cambridge, Mass. He leads the Distributed R open-source technology initiative and advanced analytics features of the HP Vertica platform. Prior to joining HP, he was a product manager and architect of Oracle Fusion Sales Configurator Application. Prior to that, he was an expert group member of Java Data Mining (JDM) standards and led development of many modules of Oracle’s Data Mining platform.

Sunil is a co-author of "Java Data Mining" book and publisher of several articles.