Sameer Farooqui

Spark + Cassandra: Technical Integration Details

Date: This event took place live on November 12 2014

Presented by: Sameer Farooqui

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to

Description:

Hosted By: Ben Lorica

This webcast will cover an architecture deep dive around how the Apache Cassandra database integrates with the Apache Spark computation engine.

We will cover:

  • Ideal use cases for Cassandra + Spark
  • Details of how Cassandra's murmer3 partitioning maps to a Spark RDD's internal partitioning
  • Considerations when using caching in Spark against C* tables
  • Specific configuration settings relevant to Cassandra + Spark integration
  • The DataStax open source Spark connector for Cassandra 2.x and how it works
  • Introduction to a free ~100 page 'DevOps' lab document (licensed under Creative Commons) that Databricks has released around how the integration works
  • Live demo of a Cassandra + Spark cluster (how to read data from a C* table into a Spark RDD, do some transformations on the RDD, write results back into a Cassandra table)
  • Upcoming features in future versions of the connector, current issues to be aware of
  • Q & A

About Sameer Farooqui

Sameer is a Client Services Engineer at Databricks, where he works with customers on Apache Spark deployments. He has extensive industry expertise in the Hadoop ecosystem, Cassandra, Couchbase and general NoSQL domain. Prior to Databricks, Sameer worked 2 years as a freelance big data consultant + trainer globally and taught 100+ big data courses. Before that, Sameer was a Systems Architect at Hortonworks, an Emerging Data Platforms Consultant at Accenture R&D and a Enterprise Consultant for Symantec/VERITAS (specializing in VCS, VVR, SF-HA).

About Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. He is an advisor to Databricks.

You may also be interested in:

Strata + Hadoop World