Advanced Architecture for Big Data Applications

Video description

Sharpen your architectural skills by understanding challenges in the main areas of distributed systems: storage, computation, messaging, timing, and consensus. You’ll learn how to develop highly scalable big data applications using Apache Accumulo, to model and design an agile data warehouse, and to use Elasticsearch to search, aggregate, analyze, and scale large volume datastores. You’ll also learn how to identify insecurities in your big data cluster, and to secure them using MIT Kerberos, authentication with Active Directory, and authorization.

Publisher resources

Download Example Code

Table of contents

  1. Welcome
  2. What Distributed Systems Are, and Why They Exist
  3. Read Replication
  4. Sharding
  5. Consistent Hashing
  6. CAP Theorem
  7. Distributed Transactions
  8. Distributed Computation Introduction
  9. Map Reduce
  10. Hadoop
  11. Spark
  12. Storm
  13. Lambda Architecture
  14. Synchronization
  15. Network Time Protocol
  16. Vector Clocks
  17. Distributed Consensus: Paxos
  18. Messaging Introduction
  19. Kafka
  20. Zookeeper
  21. Wrap-Up
  22. Getting Started
    1. About The Course
    2. About The Author
    3. What Is A Data Warehouse?
    4. Comparing Operational Applications And Data Warehouses
  23. Data Warehouse Overview
    1. Development Approach
    2. Data Sources
    3. Staging Tables
    4. Data Warehouse Model
    5. Data Warehouse Design
    6. Data Warehouse Data
    7. End User Access, Old Data, And Metadata Management
    8. Introduction To The Case Study
  24. Data Sources
    1. Data Modeling Review - Part 1
    2. Data Modeling Review - Part 2
    3. Data Sources Overview
    4. Source Data: Menu Definition
    5. Source Data: Miscellaneous Metadata
    6. Source Data: Customer Order
    7. Source Data: Customer Account
    8. Source Data: Customer Prospect
    9. Source Data: Vendor Procurement
    10. Case Study: Assess Source Data
  25. Staging Tables
    1. Staging Tables Overview
    2. Case Study: Create Staging Model
  26. Data Warehouse Modeling Basics
    1. The Star Schema
    2. Dimension
    3. Fact
    4. Surrogate Keys
    5. The Bus Architecture
    6. Dimensional Modeling And Agile Development
    7. Practical Tips
    8. Self Assessment Test
    9. Case Study: Business Requirements
    10. Case Study: Bus Architecture
  27. Recurrent Dimensions
    1. Date
    2. Time
    3. Customer
    4. Account
    5. Employee
    6. Unit of Measure
    7. Product
    8. Currency
    9. Audit
    10. Case Study: Initial Warehouse Model - Part 1
    11. Case Study: Initial Warehouse Model - Part 2
    12. Case Study: Initial Warehouse Model - Part 3
  28. DW Modeling - Advanced Dimension
    1. Kinds of Conformed Dimensions
    2. Junk Dimension
    3. Degenerate Dimension
    4. Slowly Changing Dimension - Part 1
    5. Slowly Changing Dimension - Part 2
    6. Snowflake, Outrigger, and Bridge
    7. Swappable Dimension
    8. Master Dimension
    9. Hierarchy
    10. Practical Tips
    11. Self Assessment Test
    12. Case Study: Elaborate Dimensions
  29. DW Modeling - Advanced Fact
    1. Kinds Of Facts
    2. Transaction Fact
    3. Periodic Snapshot
    4. Accumulating Snapshot
    5. Aggregate Fact
    6. Consolidated Fact
    7. Practical Tips
    8. Case Study: Elaborate Facts
  30. Data Warehouse Modeling Recap
    1. Warehouse Modeling Review
    2. Common Warehouse Modeling Mistakes
  31. Data Warehouse Design
    1. Conceptual, Logical, Physical Models
    2. System Attributes - Part 1
    3. System Attributes - Part 2
    4. Data Types And Domains
    5. Nullability
    6. Constraints
    7. Data Warehouse Tuning - Part 1
    8. Data Warehouse Tuning - Part 2
    9. Views - Part 1
    10. Views - Part 2
    11. Miscellaneous Aspects Of Design
    12. Practical Tips
    13. Self Assessment Test
    14. Case Study: Create Staging SQL
    15. Case Study: Execute Staging SQL
    16. Case Study: Create Warehouse SQL
    17. Case Study: Execute Warehouse SQL
  32. Data Warehouse Data
    1. Warehouse Data Overview
    2. Source-To-Target Mappings
    3. Data Profiling
    4. Loading Staging Tables - Part 1
    5. Loading Staging Tables - Part 2
    6. Loading The Date and Time Dimensions - Part 1
    7. Loading The Date and Time Dimensions - Part 2
    8. Initial Warehouse Loading: Dimensions
    9. Initial Warehouse Loading: Facts
    10. Updating The Warehouse
    11. Warehouse Data Processing And Agile Development
    12. Case Study: Load Warehouse Data
  33. End User Access
    1. End User Access Overview
    2. Case Study: Analyze Data - Part 1
    3. Case Study: Analyze Data - Part 2
  34. Data And Metadata Management
    1. Offload Of Old Data
    2. Metadata Management
  35. Conclusion
    1. Course Wrap-Up
  36. In Search Of Database Nirvana
    1. The Swinging Database Pendulum
    2. Hybrid Transaction/Analytical Processing Workloads
    3. Query Versus Storage Engines
    4. The Challenges Of HTAP
  37. Getting Started
    1. Introduction To Elasticsearch
    2. About The Author
  38. Basic Operations
    1. Installing And Configuring Elasticsearch
    2. Document CRUD - Creating, Retrieving, Updating And Deleting
    3. Running Searches And Aggregations
  39. Data Structure
    1. Mappings And Predefined Fields
    2. Core Types For Your Own Fields
    3. Using Predefined And Custom Analyzers
  40. Queries And Relevance
    1. Returning Specific Fields, Sorting And Pagination
    2. Full-Text Search With Match And Multi-Match Queries
    3. Using The Lucene Query Syntax In Query Strings
    4. Combining Full-Text And Term-Oriented Queries With The Bool Query
    5. Tuning Relevance
  41. Aggregations
    1. Using Queries And Aggregations Together In A Cluster
    2. Combining Different Kinds Of Aggregations
    3. Important Aggregation Types
  42. Document Relationships
    1. Objects And Nested Documents
    2. Parent-Child Relations
    3. Denormalizing And Application-Side Joins
  43. Performance And Scaling
    1. Optimizing Indexing And Searching
    2. Optimizing Node Settings
    3. Configuring Shards And Replicas
    4. Scaling Strategies
  44. Monitoring And Administration
    1. Easy Maintenance With Aliases And Index Templates
    2. Tuning Your Cluster For Stability
    3. Monitoring Elasticsearch Logs And Metrics
    4. Backups And Upgrades
  45. Conclusion
    1. Course Wrap-Up
  46. Data Model And Architecture
    1. Introduction To Accumulo
    2. About The Author
    3. The Accumulo Data Model
    4. Architecture
  47. Working With Accumulo
    1. Installation And Configuration
    2. Running And Monitoring
    3. Using The Shell
  48. Basic Application Development
    1. Starting Development
    2. Writing Data
    3. Reading Data
    4. Table API
  49. Application Security
    1. Authentication
    2. Authorization
  50. Intermediate Application Development
    1. Updates And Deletes
    2. Writing Secondary Indexes
    3. Reading Secondary Indexes
    4. Handling Hardware Failure
  51. Advanced Application Development
    1. Mapreduce
    2. Spark
    3. Iterators
    4. Thrift Proxy
  52. Performance
    1. Table Design
    2. Optimization Features
  53. Administration
    1. Monitoring
    2. Table Management
    3. Importing And Exporting Tables
    4. Cluster Changes
    5. Replication
  54. Conclusion
    1. Conclusion
  55. Course Overview
    1. About This Course
    2. About The Instructor
    3. Course Sittings
    4. At The End Of This Course
  56. Tooling
    1. Initializing Amazon Web Services
    2. Using Cloudera Director To Spin Up A Test Cluster
    3. Crash Course In Cloudera Manager
  57. Hadoop Insecurities
    1. Permissions And Encryption
    2. Where Permissions Stop
    3. Hive: Transform Harmful
  58. Authentication With MIT Kerberos
    1. Installing MIT Kerberos
    2. Enabling Kerberos Authentication
    3. Using MIT Kerberos
    4. Submitting Jobs And Running Queries With Kerberos Auth
  59. Authentication With Active Directory
    1. Installing An AD Server
    2. Preparing AD Server For Hadoop
    3. Impala LDAP Authentication With Active Directory
    4. Using Hue With Active Directory
    5. Preparing Cluster With Kerberos Authentication
    6. Running The CM Wizard
    7. Using Kerberos
    8. Sharing Kerberos Tickets With Active Directory
  60. Authorization
    1. No Authorization
    2. Enabling Sentry Authorization
    3. Using Sentry - Defining Roles
    4. Using Sentry - Querying With Hue
    5. Custom Code And Hive UDFs With Sentry
    6. HDFS Extended ACLs
    7. HDFS Sentry Sync
    8. Sentry Authentication With Solr - Part 1
    9. Sentry Authentication With Solr - Part 2
  61. Encryption
    1. Creating An HDFS Encryption Zone
    2. Using HDFS Encryption Zones
    3. SSL: Crash Course In SSL Tools
    4. SSL: Preparing A Cluster For SSL Using A Self-Signed Root CA
    5. SSL: Enabling SSL For HDFS And Yarn
    6. SSL: Verifying SSL With HDFS And Yarn
    7. SASL Hive And HiveServer2
    8. SSL With HBase And Oozie
    9. SSL With Impala
    10. SSL With Hue
  62. Developer Topics
    1. UserGroupInformation Basics
    2. Delegation Tokens
    3. Secure Impersonation
  63. Administrator Topics
    1. Role Assignments And Gateway Isolation
    2. Hbase ACLs
    3. Audits
    4. Sqoop
    5. Joining An AD Domain
  64. Secure Hadoop Topics
    1. The Secure Hadoop Market
    2. Cheats
  65. Conclusion
    1. Wrap Up

Product information

  • Title: Advanced Architecture for Big Data Applications
  • Author(s): O'Reilly Media, Inc.
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491978658