Skip to content
search
menu
Unboxing Data Startups - Michael Abbott
Overview
30h 56m remaining
227 of 261
Introduction to Clickstream Case Study
Unboxing Data Startups - Michael Abbott
Overview
30h 56m remaining
227 of 261
Introduction to Clickstream Case Study
Contents
Details
Architect and Build Big Data Applications
Instructor
Ben Lorica
Introduction to Big Data
Overview
Introduction To Big Data
Introduction To The Course
58s
About The Author
1m 20s
Big Data Challenges
2m 4s
Big Data Characteristics
5m 50s
Problems In Capitalizing On Big Data
3m 10s
Solving Big Data Problems
2m 10s
The Challenges Of Relational Databases
3m 38s
MapReduce And Hadoop
MapReduce And Hadoop
2m 27s
MapReduce Algorithm
6m 8s
Introducing Hadoop
3m 26s
Hadoop Distributed File System
Hadoop Distributed File System
5m 41s
Interacting With HDFS
2m 57s
Hadoop Infrastructure
Hadoop Infrastructure
3m 48s
YARN
2m 19s
Programming Hadoop
Programming Hadoop
4m 54s
Hive
Hive
6m 12s
Hive Architecture
4m 45s
Hive Data Model
5m 19s
Hive Queries
3m 9s
When To Use Hive
3m 6s
Pig
Pig
3m 1s
Pig Data Model
3m 30s
Pig Latin
5m 56s
Pig Example
3m 40s
When To Use Pig
3m 12s
Scalding
Scalding
3m 14s
Programming With Scalding
5m 43s
When To Use Scalding
2m 47s
Hadoop Ecosystem
Hadoop Ecosystem
8m 49s
HBase
8m 19s
When To Use HBase
2m 8s
Beyond Classic Hadoop - Spark And Flink
7m 15s
NoSQL
NoSQL Stores
7m 48s
Key-Value Stores
2m 51s
Columnar Stores
3m 25s
Document Stores
2m 36s
Graph Stores
2m 28s
Data Modeling For NoSQL Stores
3m 34s
Streaming
Streaming
1m 58s
Storm
3m 45s
Spark And Flink Streaming
2m 9s
Lambda Architecture
2m 50s
Big Data And NoSQL In The Enterprise
Introducing Big Data And NoSQL In The Enterprise
6m 34s
Polyglot Persistence
5m 27s
Seven Habits Of Successful Big Data And NoSQL Projects
2m 41s
Wrap Up
Wrap-Up
17s
Learning Apache Cassandra
Overview
Introduction To Cassandra
Introducing The Course
4m 41s
Understanding What Cassandra Is
4m 58s
Learning What Cassandra Is Being Used For
4m 56s
Understanding The System Requirements
6m 54s
How To Access Your Working Files
1m 15s
Opening The Main Virtual Machine
2m 53s
Pop Quiz
1m 24s
Getting Started With The Architecture
Understanding That Cassandra Is A Distributed Database
2m 23s
Learning What Snitch Is For
3m 53s
Learning What Gossip Is For
1m 52s
Learning How Data Gets Distributed
5m 35s
Learning About Replication
2m 12s
Learning About Virtual Nodes
3m 1s
Pop Quiz
1m 25s
Installing Cassandra
Downloading Cassandra
2m 48s
Ensuring Oracle Java 7 Is Installed
2m 2s
Installing Cassandra
3m 44s
Viewing The Main Configuration File
2m 46s
Providing Cassandra With Permission To Directories
1m 46s
Starting Cassandra
3m 41s
Checking Status
4m
Accessing The Cassandra system.log File
2m 6s
Pop Quiz
1m 28s
Communicating With Cassandra
Understanding Ways To Communicate With Cassandra
3m 47s
Using CQLSH
2m 29s
Pop Quiz
1m 8s
Creating A Database
Understanding A Cassandra Database
1m 54s
Defining A Keyspace
4m 57s
Deleting A Keyspace
52s
Pop Quiz
1m 53s
Lab: Create A Second Database
2m 39s
Creating A Table
Creating A Table
1m 49s
Defining Columns And Data Types
2m 48s
Defining A Primary Key
1m 49s
Recognizing A Partition Key
2m 44s
Specifying A Descending Clustering Order
3m 2s
Pop Quiz
1m 54s
Lab: Create A Second Table
2m 33s
Inserting Data
Understanding Ways To Write Data
1m 28s
Using The INSERT INTO Command
4m 45s
Using The COPY Command
5m 53s
How Data Is Stored In Cassandra
4m 21s
How Data Is Stored On Disk
5m 29s
Pop Quiz
2m 15s
Lab: Insert Data
9m 10s
Modeling Data
Understanding Data Modeling In Cassandra
1m 21s
Using A WHERE Clause
4m 17s
Understanding Secondary Indexes
2m 18s
Creating A Secondary Index
1m 38s
Defining A Composite Partition Key
9m 34s
Pop Quiz
3m 34s
Creating An Application
Understanding Cassandra Drivers
2m 31s
Exploring The DataStax Java Driver
3m 14s
Setting Up A Development Environment
4m 4s
Creating An Application Page
4m 51s
Acquiring The DataStax Java Driver Files
3m 24s
Getting The DataStax Java Driver Files Through Maven
2m 23s
Providing The DataStax Java Driver Files Manually
2m 36s
Connecting To A Cassandra Cluster
3m 39s
Executing A Query
7m 47s
Displaying Query Results - Part 1
5m 59s
Displaying Query Results - Part 2
7m 20s
Using An MVC Pattern
4m 59s
Pop Quiz
2m 50s
Lab: Create A Second Application - Part 1
5m 20s
Lab: Create A Second Application - Part 2
9m 49s
Lab: Create A Second Application - Part 3
3m 8s
Updating And Deleting Data
Updating Data
3m 39s
Understanding How Updating Works
3m 55s
Deleting Data
7m 10s
Understanding Tombstones
7m 18s
Using TTLs
5m 9s
Updating A TTL
2m 38s
Pop Quiz
2m 38s
Lab: Update And Delete Data
7m
Selecting Hardware
Understanding Hardware Choices
30s
Understanding RAM And CPU Recommendations
2m 45s
Selecting Storage
4m 8s
Deploying In The Cloud
4m 7s
Pop Quiz
2m 6s
Adding Nodes To A Cluster
Understanding Cassandra Nodes
3m 39s
Having A Network Connection - Part 1
5m 35s
Having A Network Connection - Part 2
5m 2s
Having A Network Connection - Part 3
4m 46s
Specifying The IP Address Of A Node In Cassandra
4m 12s
Specifying Seed Nodes
6m 30s
Bootstrapping A Node
6m 18s
Cleaning Up A Node
2m 59s
Using cassandra-stress
10m 33s
Pop Quiz
1m 39s
Lab: Add A Third Node
10m 42s
Monitoring A Cluster
Understanding Cassandra Monitoring Tools
46s
Using Nodetool
4m 54s
Using JConsole
3m 24s
Learning About OpsCenter
3m 24s
Pop Quiz
1m 49s
Repairing Nodes
Understanding Repair
5m 17s
Repairing Nodes
4m 17s
Understanding Consistency - Part 1
6m 26s
Understanding Consistency - Part 2
4m 33s
Understanding Hinted Handoff
3m 30s
Understanding Read Repair
1m 58s
Pop Quiz
3m 30s
Lab: Repair Nodes For A Keyspace
5m 45s
Removing A Node
Understanding Removing A Node
54s
Decommissioning A Node
4m 36s
Putting A Node Back Into Service
6m 38s
Removing A Dead Node
6m 42s
Pop Quiz
4m 10s
Lab: Put A Node Back Into Service
5m
Redefining A Cluster For Multiple Data Centers
Redefining For Multiple Data Centers - Part 1
4m 50s
Redefining For Multiple Data Centers - Part 2
5m 59s
Changing Snitch Type
5m 25s
Modifying cassandra-rackdc.properties
7m 45s
Changing Replication Strategy - Part 1
5m 55s
Changing Replication Strategy - Part 2
3m 58s
Pop Quiz
2m 30s
Resources For Further Learning
Accessing Documentation
2m 51s
Reading Blogs And Books
4m 53s
Watching Video Recordings
4m 5s
Posting Questions
4m 10s
Attending Events
3m
Wrap Up
1m 3s
Introduction to Apache Kafka
Overview
The Case for Kafka
11m 23s
The Basics
9m 10s
Setting up a Kafka Cluster
15m 30s
Writing a Kafka Producer
14m 33s
Writing a Kafka Consumer
16m 34s
Using Kafka from Python
8m 3s
Troubleshooting Kafka
29m 29s
Integrating Kafka and Hadoop with Flafka
26m 6s
Kafka Availability and Consistency
22m 38s
Kafka Ecosystem
13m 13s
Future of Kafka
8m 53s
Introduction to Apache Spark
Overview
Pre-Flight Check
13m 8s
Spark Deconstructed
14m 31s
A Brief History
23m 28s
Simple Spark Apps
25m 7s
Spark Essentials
35m 18s
Spark Examples
21m 55s
Unifying the Pieces - Spark SQL
24m 7s
Unifying the Pieces - Spark Streaming
14m 48s
Unifying the Pieces - MLlib and GraphX
20m
Unified Workflows Demo
22m 35s
The Full SDLC
4m 1s
Developer Certification
6m 10s
Resources
4m 44s
Introduction - Why DataFrames?
2m 28s
ETL to Prepare the Data from Capital Bikeshare
2m 46s
Create a DataFrame, Explore using SQL
2m 47s
Data Preparation for Machine Learning Models
5m 33s
Build a Classifier Using Naive Bayes
4m 43s
Build a Classifier Using Decision Trees
2m 26s
Build a Classifier Using Random Forests
2m 20s
Use a DataFrame to Compare Models
4m 15s
Parquet as a Best Practice with DataFrames
58s
How to Store a DataFrame with Parquet
3m 25s
How to Read a DataFrame Back in From Parquet
2m 57s
Use SQL to Estimate Route Durations
1m 41s
Data Preparation for GraphX - Model Route Costs
4m 43s
Use PageRank to Rank Popular Stations
3m 14s
Optimize Routes to Columbus Circle
3m 43s
Compare Results with Google Maps
1m 58s
Analyze a Popular Tourist Route
2m 30s
Examples of How to Use DataFrames in Python
2m 57s
Summary - The New DataFrames Features in Spark
1m 3s
Building Big Data Platforms
Overview
Introduction - Building big data platforms at Strata+Hadoop World - Ben Lorica
53s
Big Data at Netflix: Faster and Easier - Kurt Brown
40m 27s
Building Interactive Data Applications at Scale - Fangjin Yang and Vadim Ogievetsky
42m 56s
Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3 - Anil Madan
50m 36s
Building Real-time Data Products at LinkedIn with Apache Samza - Martin Kleppmann
49m 42s
An Open Source Approach to Gathering and Analyzing Device Sourced Health Data - Ian Eslick
41m 41s
Ticketmaster: Marketing and Selling the World's Tickets - John Carnahan
39m 35s
Unlocking Big Data at CERN - Matthias Braeger and Manish Devgan
41m 13s
Unboxing Data Startups - Michael Abbott
38m 50s
Architectural Considerations for Hadoop Applications
Overview
Introduction to Clickstream Case Study
11m 19s
Requirements
8m 4s
Data Modeling
14m 55s
Data Ingest
16m 16s
Data Processing Engines - Part 1
16m 23s
Data Processing Engines - Part 2
10m 59s
Data Processing Patterns
9m 32s
Orchestration
14m 34s
Putting It All Together
3m 8s
Demo
21m 47s
Q&A
24m 35s
An Introduction to Time Series with Team Apache
Overview
Introduction
Introduction to Time Series Problems
9m 58s
Kafka
Kafka Architecture and Deployment
11m 33s
Kafka Usage
3m 42s
Spark
Introduction to Spark
15m 43s
Spark Architecture
12m 2s
Spark Streaming
Spark Streaming: Windows & Slides
8m 35s
Spark Streaming: Ingestion Sources & Using Kafka
8m 32s
Sparks Streaming: Operations on the Stream
1m 30s
Cassandra
Introduction to Cassandra
8m 56s
Cassandra Basic Architecture
11m 59s
Replication, High Availability and Multi Datacenter
14m 6s
Cassandra Weather Website Example
11m 46s
Cassandra Query Language (CQL)
18m
Cassandra Partitions & Clustering
8m 22s
Cassandra Read and Write Path
12m 17s
Working with Cassandra
6m 32s
Cassandra Drivers and Access Patterns
10m 37s
Spark and Cassandra
Spark and Cassandra Architecture
12m
Analyzing Cassandra Data & Spark SQL
12m 12s
Spark and Cassandra DataStax Enterprise
4m 31s
Real World Use Cases
Real World Use Cases: Streaming Problems
17m 11s
Real World Use Cases: In-place Analytic Problems
10m 58s