O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hands-on DevOps

Book Description

Transform yourself into a specialist in DevOps adoption for Big Data on cloud

About This Book

  • Learn the concepts of Bigdata and Devops and Implement them
  • Get Acquainted with DevOps Frameworks Methodologies and Tools
  • A practical approach to build and work efficiently with your big data cluster
  • Get introduced to multiple flavors of tools and platforms from vendors on Hadoop, Cloud, Containers and IoT Offerings
  • In-Depth Technology understanding on Data Sciences, Microservices, Bigdata

Who This Book Is For

If you are a Big Data Architects, solutions provider, or any stakeholder working in big data environment and wants to implement the strategy of DevOps, then this book is for you.

What You Will Learn

  • Learn about the DevOps culture, its frameworks, maturity, and design patterns
  • Get acquainted with multiple niche technologies microservices, containers, kubernetes, IoT, and cloud
  • Build big data clusters, enterprise applications and data science models
  • Apply DevOps concepts for continuous integration, delivery, deployment and monitoring
  • Get introduced to Open source tools, service offerings from multiple vendors
  • Start digital journey to apply DevOps concepts to migrate big data, cloud, microservices, IoT, security, ERP systems

In Detail

DevOps strategies have really become an important factor for big data environments.

This book initially provides an introduction to big data, DevOps, and Cloud computing along with the need for DevOps strategies in big data environments. We move on to explore the adoption of DevOps frameworks and business scenarios. We then build a big data cluster, deploy it on the cloud, and explore DevOps activities such as CI/CD and containerization. Next, we cover big data concepts such as ETL for data sources, Hadoop clusters, and their applications. Towards the end of the book, we explore ERP applications useful for migrating to DevOps frameworks and examine a few case studies for migrating big data and prediction models.

By the end of this book, you will have mastered implementing DevOps tools and strategies for your big data clusters.

Style and approach

A clear, concise, and straightforward book that will enable you to use and implement DevOps on big data to improve the efficiency.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the color images of this book
      2. Errata
      3. Piracy
      4. Questions
  2. Introduction to DevOps
    1. DevOps application - business scenarios
    2. Business drivers for DevOps adoption to big data
      1. Data explosion
      2. Cloud computing
      3. Big data
      4. Data science and machine learning
      5. In-memory computing
    3. Planning the DevOps strategy
    4. Benefits of DevOps
    5. Summary
  3. Introduction to Big Data and Data Sciences
    1. Big data 
    2. In-memory technology
      1. In-memory database (IMDB)
      2. Hardware technology advances adopted for In-memory systems
      3. Software  technology advances adopted for In-memory systems
        1. Data compression
        2. No aggregate tables
        3. Insert-only tables
        4. Column, row, and hybrid storage
        5. Partitioning
    3. NoSQL databases
      1. Benefits of NoSQL
    4. Data visualization
      1. Data visualization application
    5. Data science
    6.  Summary
  4. DevOps Framework
    1. DevOps process
    2. DevOps best practices
      1. DevOps process
        1. Source Code Management (SCM)
        2. Code review
        3. Configuration Management
        4. Build management
        5. Artifacts repository management
        6. Release management
        7. Test automation
        8. Continuous integration
        9. Continuous delivery
        10. Continuous deployment
        11. Infrastructure as Code
        12. Routine automation
        13. Key application performance monitoring/indicators
    3. DevOps frameworks
      1. DevOps maturity life cycle
      2. DevOps maturity map
      3. DevOps progression framework/readiness model
      4. DevOps maturity checklists
      5. Agile framework for DevOps process projects
        1. Agile ways of development
    4. Summary
  5. Big Data Hadoop Ecosystems
    1. Big data Hadoop ecosystems
      1. Inbuilt tools and capabilities in Hadoop ecosystem
    2. Big data clusters
      1. Hadoop cluster attributes
        1. High availability
        2. Load balancing
        3. High availability and load balancing
        4. Distributed processing and parallel processing
      2. Usage of Hadoop big data cluster
    3. Hadoop big data cluster nodes
      1. Types of nodes and their roles
    4. Commercial Hadoop distributions
      1. Hadoop Cloudera enterprise distribution
        1. Data integration services
        2. Hadoop data storage
        3. Data access services
        4. Database
        5. Unified (common) services
        6. Cloudera proprietary services and operations/cluster management
      2. A Hadoop Hortonworks framework
        1. Data governance and schedule pipeline
        2. Cluster management
        3. Data access
        4. Data workflow
      3. A Hadoop MapR framework
        1. Machine learning
        2. SQL stream
        3. Storage, retrieval, and access control
        4. Data integration and access
        5. Provisioning and coordination
      4. Pivotal Hadoop platform HD Enterprise
      5. A Hadoop ecosystem on IBM big data
      6. A Hadoop ecosystem on AWS
      7. Microsoft Hadoop platform is HDInsight hosted on Microsoft Azure
    5. Capacity planning for systems
      1. Guideline for estimating and capacity planning
      2. Cluster-level sizing estimates
        1. For master node
        2.  Worker node
        3. Gateway node
    6. Summary
  6. Cloud Computing
    1. Cloud computing technologies
      1. Cloud technology concepts
        1. Authentication and security
    2. Multi-tier cloud architecture model
      1. Presentation tier
      2. Business logic tier
      3. Data tier
      4. Relational databases
      5. NoSQL database
      6. Data storage
    3. Cloud architectures
      1. Public cloud
      2. Private cloud
      3. Hybrid cloud
      4. Community cloud model
    4. Cloud offerings
      1. Software as a Service (SaaS)
        1. Single tenant
        2. Multi-tenancy
        3. Multi-instance
        4. Benefits of SaaS
      2. Platform as a Service (PaaS)
        1. Development as a Service (DaaS)
        2. Data as a Service with Paas
        3. Database as a Service with Paas
        4. PaaS tied to SaaS environment
        5. PAAS tied to an operating environment
        6. Open-platform PaaS
        7. Microsoft Azure Portal
        8. Amazon Web Services
        9. Salesforce offerings on cloud
      3. Network as a Service (NaaS)
      4. Identity as a service (IDaaS)
        1. Single Sign-On
        2. Federated Identity Management (FIDM)
        3. OpenID
    5. Cloud security
      1. Data encryption
        1. Encryption in transit
        2. Encryption-at-rest
        3. End-to-end encryption
    6. Backup and recovery
    7. Summary
  7. Building Big Data Applications
    1. Traditional enterprise architecture
    2. Principles to build big data enterprise applications
    3. Big data systems life cycle
      1. Data discovery into the system
        1. Data discovery stages
      2. Data quality
        1. Batch processing
          1. RDBMS to NoSQL
          2. Flume
          3. Stream processing
          4. Real-time
          5. Lambda architecture
      3. The data storage layer
      4. Data storage - best practices for better organization and effectiveness
        1. Landing
        2. Raw
        3. Work
        4. Gold
        5. Quarantine
        6. Business
        7. Outgoing
      5. Computing and analyzing data
      6. Apache Spark analytic platform
        1. Spark Core Engine
        2. Spark SQL
        3. Spark Streaming
      7. Visualization with big data systems
      8. Data governance
        1. People and collaboration in accordance with DevOps core concept
        2. Environment management
        3. Documentation
        4. Architecture board
        5. Development and build best practices
        6. Version control
        7. Release management
    4. Building enterprise applications with Spark
      1. Client-services presentation tier
      2. Data catalog services
      3. Workflow catalog
      4. Usage and tracking
      5. Security catalog
      6. Processing framework
      7. Ingestion services
    5. Data science
      1. Approach to data science
        1. Supervised models
          1. Neural network
          2. Multi layer perceptron
          3. Decision tree
        2. Unsupervised models
          1. Clusters
          2. Distances
          3. Normalization
          4. K-means
    6. Summary
  8. DevOps - Continuous Integration and Delivery
    1. Best practices for CI/CD
    2. Jenkins setup 
      1. Prerequisites to install Jenkins
        1. Standalone Installation
          1. Linux system installation on Ubuntu
    3. Git (SCM) integration with Jenkins
      1. Integrating GitHub with Jenkins
    4. Maven (Build) tool Integration with Jenkins
    5. Building jobs with Jenkins
    6. Source code review - Gerrit
    7. Installation of Gerrit
    8. Repository management
    9. Testing with Jenkins
      1. Setting up unit testing
      2. Automated test suite
    10. Continuous delivery- Build Pipeline
    11. Jenkins features
      1. Security in Jenkins
    12. Summary
  9. DevOps Continuous Deployment
    1. Chef
      1. Chef landscape components
        1. Chef server
          1. Features of Chef server
          2. Chef client on nodes
          3. Ohai
          4. Workstations
          5. Chef repo
      2. Extended features of Chef
        1. Habitat
        2. InSpec
      3. Chef Automate workflow
      4. Compliance
    2. Ansible
      1. Prominent features
      2. Benefits of Ansible
      3. Ansible terminology, key concepts, workflow, and usage
        1. CMDB
        2. Playbooks
        3. Modules
        4. Inventory
        5. Plugins
        6. Ansible Tower
        7. Ansible Vault
        8. Ansible Galaxy
      4. Testing strategies with Ansible
    3. Monitoring
    4. Splunk
    5. Nagios monitoring tool for infrastructure
      1. Nagios – enterprise server and network monitoring software
    6. Integrated dashboards for network analysis, monitoring, and bandwidth
    7. Summary
  10. Containers, IoT, and Microservices
    1. Virtualization
      1. Hypervisor
      2. Types of virtualization
        1. Emulation
        2. Paravirtualization
        3. Container-based virtualization
    2. Containers
      1. Docker containers
      2. Java EE containers as a part of Java EE
        1. Java EE server and containers
      3. Amazon ECS container service
        1. Containers and images
        2. Task definitions
        3. Tasks and scheduling
        4. Clusters
        5. Container agent
      4. Pivotal container services
      5. Google container services
    3. Container orchestration
      1. Orchestration tools
      2. Kubernetes
      3. Docker orchestration tools
    4. Internet of Things (IoT)
      1. IoT - eco system
        1. Standard devices
      2. Data synthesis
      3. Data collection
      4. Device integration
      5. Real-time analytics
      6. Application and process extension
      7. Technology and protocols
      8. IoT - application in multiple fields
      9. IoT platforms for development
        1. ThingWorx
        2. Virtualized Packet Core (VPC)
        3. Electric Imp
        4. Predix
        5. Eclipse IoT
        6. SmartHome
        7. Eclipse SCADA
        8. Contiki
          1. Contiki communication
        9. Dynamic module loading
        10. The Cooja network simulator
    5. Microservices
      1. Microservices core patterns
      2. Microservices architecture
      3. Microservice decision
      4. Microservices deployment patterns
      5. Distribution patterns
      6. Microservice chassis
      7. Communication mode
      8. Data management options
      9. API interface
      10. Service discovery
    6. Summary
  11. DevOps for Digital Transformation
    1. Digital transformation
    2. Big data and DevOps
      1. Planning effectively on software updates
      2. Lower error rates
      3. Consistency of development and production environments
      4. Prompt feedback from production
      5. Agility of big data projects
        1. Big Data as a service
        2. The ETL datamodels
          1. Methodology 1
          2. Methodology 2
          3. Methodology 3
          4. Methodology 4
          5. Methodology 5
          6. Methodology 6
    3. Cloud migration - DevOps
      1. Migration strategy/approach
    4. Migration to microservices - DevOps
      1. Strategy 1 - standalone microservice
      2. Strategy 2 - separate frontend and backend
      3. Strategy 3 - extraction of services
        1. Prioritizing the modules for conversion to services
        2. The process to extract a module
          1. Stage 1
          2. Stage 2
    5. Apps modernization
    6. Architecture migration approach
      1. Data coupling
      2. Microservices scalability
    7. Best practices for architectural and implementation considerations
      1. Domain modeling
      2. Service size
      3. Testing
      4. Service discovery
      5. Deployment
      6. Build and release pipeline
      7. Feature flags
      8. Developer productivity with microservices adoption
      9. Monitoring and operations
      10. Organizational considerations
    8. DevOps for data science
      1. The DevOps continuous analytics environment
    9. DevOps for authentication and security
      1. Kerberos realm
        1. The user and the authentication server
      2. Client and the HTTP service
    10. DevOps for IoT systems
      1. Security by design
    11. Summary
  12. DevOps Adoption by ERP Systems
  13. DevOps Periodic Table
  14. Business Intelligence Trends
  15. Testing Types and Levels
  16. Java Platform SE 8