Overview
In this 14 hr course, you will explore the essentials of Hadoop and its ecosystem, mastering the design and management of distributed systems to handle big data. Learn with real-world examples how to harness tools like Spark, Flume, and Kafka for contemporary data challenges.
What I will be able to do after this course
- Understand and implement the core principles of Hadoop, including HDFS and YARN.
- Learn to process big data with scripting tools like Pig and programming libraries in Spark.
- Utilize databases like HBase and MongoDB for storing and retrieving non-relational data.
- Manage and analyze streaming data with technologies such as Kafka and Spark Streaming.
- Gain hands-on experience with advanced Hadoop cluster management and ecosystem components.
Course Instructor(s)
Frank Kane is an experienced instructor with a background in data science and distributed computing, having worked extensively in the tech industry. His teaching style combines his practical experience with clear explanations, making complex technical topics accessible. Learners benefit from his expertise and friendly guidance throughout the course.
Who is it for?
This course caters to technology enthusiasts, ranging from software developers to project managers, looking to deepen their knowledge of big data solutions. Ideal for learners with basic Python or Scala proficiency and familiarity with the Linux command line. Participants aim to enhance their system design capabilities and data handling skills, especially in analyzing large datasets.