How to use Apache Spark’s Resilient Distributed Dataset (RDD) API.
Ted Malaska is a solutions architect at Cloudera and has worked on close to 100 clusters for over two- to three-dozen clients with over hundreds of use cases. Ted has 18 years of professional experience working for startups, the US government, a number of the world’s largest banks, commercial firms, bio firms, retail firms, hardware appliance firms, and the largest nonprofit financial regulator in the US. He has architecture experience across topics such as Hadoop, Web 2.0, mobile, SOA (ESB, BPM), and big data. Ted is a regular contributor to the Hadoop, HBase, and Spark projects, a regular committer to Flume, Avro, Pig, and YARN, and the coauthor of O’Reilly Media’s Hadoop Application Architectures.
Shared nothing architectures: Giving Hadoop's data processing frameworks scalability and fault tolerance
A look at the tools and patterns for accessing and processing data in Hadoop.
Mark Grover and Ted Malaska offer an overview of projects for streaming applications, including Kafka, Flume, and Spark Streaming, and discuss the architectural schemas available, such as Lambda and Kappa.
How decoupling, optimization, and specialization resemble connective systems in our bodies.
Ted Malaska explains how long hours of training, blisters, and shin splints relate to life-changing lessons in software architecture.
Good code comes from motivation and fresh minds.
In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska