Get the most out of the popular Apache Spark framework to perform efficient analytics on your real-time data
About This Video
- Learn Apache Spark fundamentals and Leverage the power of Apache Spark to perform efficient data processing and analytics on your data
- Quick overview of Apache Hadoop and Scala programming language to get started with Apache Spark
- This comprehensive tutorial will help you get the most out of the trending big data framework for all your data processing needs
This video is a comprehensive tutorial to help you learn all the fundamentals of Apache Spark, one of the trending big data processing frameworks on the market today. We will introduce you to the various components of the Spark framework to efficiently process, analyze, and visualize data.
You will also get the brief introduction of Apache Hadoop and Scala programming language before start writing with Spark programming. You will learn about the Apache Spark programming fundamentals such as Resilient Distributed Datasets (RDD) and See which operations can be used to perform a transformation or action operation on the RDD. We'll show you how to load and save data from various data sources as different type of files, No-SQL and RDBMS databases etc.. We’ll also explain Spark advanced programming concepts such as managing Key-Value pairs, accumulators etc. Finally, you'll discover how to create an effective Spark application and execute it on Hadoop cluster to the data and gain insights to make informed business decisions.
By the end of this video, you will be well-versed with all the fundamentals of Apache Spark and implementing them in Spark.
What you will learn
- History of Apache Spark and the introduction of Spark components
- Learn how to get started with Apache Spark
- Introduction to Apache Hadoop, it’s processed and components – HDFS, YARN and Map Reduce
- Introduction of programming language – Scala, Scala fundamentals such as classes, objects in Scala, Collections in Scala, etc.
- Apache Spark programming fundamentals and Resilient Distributed Datasets (RDD)
- See which operations can be used to perform a transformation or action operation on the RDD
- Find out how to load and save data in Spark
- Write Spark application in Scala and execute it on Hadoop cluster
Who should take this course
This course is for data scientists, big data technology developers and analysts who want to learn the fundamentals of Apache Spark from a single, comprehensive source, instead of spending countless hours on the internet trying to take bits and pieces from different sources. Some familiarity with Scala would be helpful.
About the author
Nishant Garg has over 16 years of software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum).
He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a senior technical architect for the Big Data R&D Labs with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of the most recognizable names in IT services and financial industries, employing full software life cycle methodologies such as Agile and SCRUM.
Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Learning Apache Kafka & HBase Essestials, Packt Publishing.
About Packt Video
Packt Video publishes friendly, practical video tutorials, packed with practical skills, concepts and guidance to help you succeed with new technologies and tasks. Packt Video’s series include Learn, Hands-On, Mastering, In 7 Days, Troubleshooting, and more. Our courses cover web and software development, security and ethical hacking, data science, and other key tech topics. We exist to make cutting-edge topics accessible for all.
Table of Contents
- Chapter 1 : Introducing Spark
- Chapter 2 : Hadoop and Spark
- Chapter 3 : Scala from 30,000 feet
- Chapter 4 : Spark Programming
- Chapter 5 : Advanced Spark Programming
- Title: Apache Spark Fundamentals
- Release date: June 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787283862