In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska
Mark is a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects. He is also a section author of O’Reilly’s book on Apache Hive – Programming Hive.
Gwen is a software engineer at Cloudera. She has 15 years of experience working with customers to design scalable data architectures. Working as an data warehouse DBA, ETL developer and a senior consultant.
She specializes in migrating data warehouses to Hadoop,integrating Hadoop with relational databases, building scalable data processing pipelines, and scaling complex data analysis algorithms.
Ted Malaska is a solutions architect at Cloudera and has worked on close to 100 clusters for over two- to three-dozen clients with over hundreds of use cases. Ted has 18 years of professional experience working for startups, the US government, a number of the world’s largest banks, commercial firms, bio firms, retail firms, hardware appliance firms, and the largest nonprofit financial regulator in the US. He has architecture experience across topics such as Hadoop, Web 2.0, mobile, SOA (ESB, BPM), and big data. Ted is a regular contributor to...