O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

MySQL 8 for Big Data

Book Description

Uncover the power of MySQL 8 for Big Data

About This Book

  • Combine the powers of MySQL and Hadoop to build a solid Big Data solution for your organization
  • Integrate MySQL with different NoSQL APIs and Big Data tools such as Apache Sqoop
  • A comprehensive guide with practical examples on building a high performance Big Data pipeline with MySQL

Who This Book Is For

This book is intended for MySQL database administrators and Big Data professionals looking to integrate MySQL 8 and Hadoop to implement a high performance Big Data solution. Some previous experience with MySQL will be helpful, although the book will highlight the newer features introduced in MySQL 8.

What You Will Learn

  • Explore the features of MySQL 8 and how they can be leveraged to handle Big Data
  • Unlock the new features of MySQL 8 for managing structured and unstructured Big Data
  • Integrate MySQL 8 and Hadoop for efficient data processing
  • Perform aggregation using MySQL 8 for optimum data utilization
  • Explore different kinds of join and union in MySQL 8 to process Big Data efficiently
  • Accelerate Big Data processing with Memcached
  • Integrate MySQL with the NoSQL API
  • Implement replication to build highly available solutions for Big Data

In Detail

With organizations handling large amounts of data on a regular basis, MySQL has become a popular solution to handle this structured Big Data. In this book, you will see how DBAs can use MySQL 8 to handle billions of records, and load and retrieve data with performance comparable or superior to commercial DB solutions with higher costs.

Many organizations today depend on MySQL for their websites and a Big Data solution for their data archiving, storage, and analysis needs. However, integrating them can be challenging. This book will show you how to implement a successful Big Data strategy with Apache Hadoop and MySQL 8. It will cover real-time use case scenario to explain integration and achieve Big Data solutions using technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. Also, the book includes case studies on Apache Sqoop and real-time event processing.

By the end of this book, you will know how to efficiently use MySQL 8 to manage data for your Big Data applications.

Style and approach

Step by Step guide filled with real-world practical examples.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introduction to Big Data and MySQL 8
    1. The importance of Big Data
      1. Social media
      2. Politics
      3. Science and research
      4. Power and energy
      5. Fraud detection
      6. Healthcare
      7. Business mapping
    2. The life cycle of Big Data
      1. Volume
      2. Variety
      3. Velocity
      4. Veracity
      5. Phases of the Big Data life cycle
        1. Collect
        2. Store
        3. Analyze
        4. Governance
    3. Structured databases
    4. Basics of MySQL
      1. MySQL as a relational database management system
      2. Licensing
      3. Reliability and scalability
      4. Platform compatibility
      5. Releases
    5. New features in MySQL 8
      1. Transactional data dictionary
      2. Roles
      3. InnoDB auto increment
      4. Supporting invisible indexes
      5. Improving descending indexes
      6. SET PERSIST
      7. Expanded GIS support
      8. The default character set
      9. Extended bit-wise operations
      10. InnoDB Memcached
      11. NOWAIT and SKIP LOCKED
    6. Benefits of using MySQL
      1. Security
      2. Scalability
      3. An open source relational database management system
      4. High performance
      5. High availability
      6. Cross-platform capabilities
    7. Installing MySQL 8
      1. Obtaining MySQL 8
      2. MySQL 8 installation
      3. MySQL service commands
    8. Evolution of MySQL for Big Data
      1. Acquiring data in MySQL
      2. Organizing data in Hadoop
      3. Analyzing data
      4. Results of analysis
    9. Summary
  3. Data Query Techniques in MySQL 8
    1. Overview of SQL
    2. Database storage engines and types
      1. InnoDB
        1. Important notes about InnoDB
      2. MyISAM
        1. Important notes about MyISAM tables
      3. Memory
      4. Archive
      5. Blackhole
      6. CSV
      7. Merge
      8. Federated
      9. NDB cluster
    3. Select statement in MySQL 8
      1. WHERE clause
        1. Equal To and Not Equal To
        2. Greater than and Less than
        3. LIKE
        4. IN/NOT IN
        5. BETWEEN
      2. ORDER BY clause
      3. LIMIT clause
      4. SQL JOINS
        1. INNER JOIN
        2. LEFT JOIN
        3. RIGHT JOIN
        4. CROSS JOIN
      5. UNION
        1. Subquery
      6. Optimizing SELECT statements
    4. Insert, replace, and update statements in MySQL 8
      1. Insert
      2. Update
      3. Replace
    5. Transactions in MySQL 8
    6. Aggregating data in MySQL 8
      1. The importance of aggregate functions
        1. GROUP BY clause
        2. HAVING clause
        3. Minimum
        4. Maximum
        5. Average
        6. Count
        7. Sum
    7. JSON
      1. JSON_OBJECTAGG
      2. JSON_ARRAYAGG
    8. Summary
  4. Indexing your data for High-Performing Queries
    1. MySQL indexing
      1. Index structures
        1. Bitmap indexes
        2. Sparse indexes
        3. Dense indexes
        4. B-Tree indexes
        5. Hash indexes
      2. Creating or dropping indexes
        1. UNIQUE | FULLTEXT | SPATIAL
        2. Index_col_name
        3. Index_options
          1. KEY_BLOCK_SIZE
          2. With Parser
          3. COMMENT
          4. VISIBILITY
          5. index_type
        4. algorithm_option
        5. lock_option
        6. When to avoid indexing
    2. MySQL 8 index types
      1. Defining a primary index
        1. Primary indexes
        2. Natural keys versus surrogate keys
      2. Unique keys
      3. Defining a column index
        1. Composite indexes in MySQL 8
        2. Covering index
        3. Invisible indexes
        4. Descending indexes
        5. Defining a foreign key in the MySQL table
          1. RESTRICT
          2. CASCADE
          3. SET NULL
          4. NO ACTION
          5. SET DEFAULT
        6. Dropping foreign keys
      4. Full-text indexing
        1. Natural language fulltext search on InnoDB and MyISAM
        2. Fulltext indexing on InnoDB
        3. Fulltext search in Boolean mode
        4. Differentiating full-text indexing and like queries
      5. Spatial indexes
    3. Indexing JSON data
      1. Generated columns
        1. Virtual generated columns
        2. Stored generated columns
      2. Defining indexes on JSON
    4. Summary
  5. Using Memcached with MySQL 8
    1. Overview of Memcached
    2. Setting up Memcached
      1. Installation
      2. Verification
    3. Using of Memcached
      1. Performance tuner
      2. Caching tool
      3. Easy to use
    4. Analyzing data stored in Memcached
    5. Memcached replication configuration
    6. Memcached APIs for different technologies
      1. Memcached with Java
      2. Memcached with PHP
      3. Memcached with Ruby
      4. Memcached with Python
    7. Summary
  6. Partitioning High Volume Data
    1. Partitioning in MySQL 8
      1. What is partitioning?
      2. Partitioning types
        1. Horizontal partitioning
        2. Vertical partitioning
    2. Horizontal partitioning in MySQL 8
      1. Range partitioning
      2. List partitioning
      3. Hash partitioning
      4. Column partitioning
        1. Range column partitioning
        2. List column partitioning
      5. Key partitioning
      6. Sub partitioning
    3. Vertical partitioning
      1. Splitting data into multiple tables
        1. Data normalization
          1. First normal form
          2. Second normal form
          3. Third normal form
          4. Boyce-Codd normal form
          5. Fourth normal form
          6. Fifth normal form
    4. Pruning partitions in MySQL
      1. Pruning with list partitioning
      2. Pruning with key partitioning
    5. Querying on partitioned data
      1. DELETE query with the partition option
      2. UPDATE query with the partition option
      3. INSERT query with the partition option
    6. Summary
  7. Replication for building highly available solutions
    1. High availability
      1. MySQL replication
      2. MySQL cluster
      3. Oracle MySQL cloud service
      4. MySQL with the Solaris cluster
    2. Replication with MySQL
      1. Benefits of replication in MySQL 8
        1. Scalable applications
        2. Secure architecture
        3. Large data analysis
        4. Geographical data sharing
      2. Methods of replication in MySQL 8
        1. Replication using binary logs
        2. Replication using global transaction identifiers
      3. Replication configuration
        1. Replication with binary log file
          1. Replication master configuration
          2. Replication slave configuration
        2. Replication with GTIDs
          1. Global transaction identifiers
          2. The gtid_executed table
          3. GTID master's side configurations
          4. GTID slave's side configurations
        3. MySQL multi-source replication
          1. Multi-source replication configuration
        4. Statement-based versus row-based replication
    3. Group replication
      1. Requirements for group replication
      2. Group replication configuration
        1. Group replication settings
        2. Choosing a single master or multi-master
        3. Host-specific configuration settings
      3. Configuring a Replication User and enabling the Group Replication Plugin
      4. Starting group replication
        1. Bootstrap node
    4. Summary
  8. MySQL 8 Best Practices
    1. MySQL benchmarks and configurations
      1. Resource utilization
      2. Stretch your timelines of benchmarks
      3. Replicating production settings
      4. Consistency of throughput and latency
      5. Sysbench can do more
      6. Virtualization world
      7. Concurrency
      8. Hidden workloads
      9. Nerves of your query
      10. Benchmarks
    2. Best practices for MySQL queries
      1. Data types
      2. Not null
      3. Indexing
        1. Search fields index
        2. Data types and joins
        3. Compound index
        4. Shorten up primary keys
        5. Index everything
      4. Fetch all data
      5. Application does the job
      6. Existence of data
      7. Limit yourself
      8. Analyze slow queries
      9. Query cost
    3. Best practices for the Memcached configuration
      1. Resource allocation
      2. Operating system architecture
      3. Default configurations
      4. Max object size
      5. Backlog queue limit
      6. Large pages support
      7. Sensitive data
      8. Restrict exposure
      9. Failover
      10. Namespaces
      11. Caching mechanism
      12. Memcached general statistics
    4. Best practices for replication
      1. Throughput in group replication
      2. Infrastructure sizing
      3. Constant throughput
      4. Contradictory workloads
      5. Write scalability
    5. Summary
  9. NoSQL API for Integrating with Big Data Solutions
    1. NoSQL overview
      1. Changing rapidly over time
      2. Scaling
      3. Less management
      4. Best for big data
    2. NoSQL versus SQL
    3. Implementing NoSQL APIs
      1. NoSQL with the Memcached API layer
        1. Prerequisites
        2. NoSQL API with Java
        3. NoSQL API with PHP
        4. NoSQL API with Python
        5. NoSQL API with Perl
      2. NDB Cluster API
        1. NDB API for NodeJS
        2. NDB API for Java
        3. NDB API with C++
    4. Summary
  10. Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop
    1. Case study for log analysis
      1. Using MySQL 8 and Hadoop for analyzing log
    2. Apache Sqoop overview
    3. Integrating Apache Sqoop with MySQL and Hadoop
      1. Hadoop
        1. MapReduce
        2. Hadoop distributed file system
        3. YARN
      2. Setting up Hadoop on Linux
      3. Installing Apache Sqoop
      4. Configuring MySQL connector
    4. Importing unstructured data to Hadoop HDFS from MySQL
      1. Sqoop import for fetching data from MySQL 8
      2. Incremental imports using Sqoop
    5. Loading structured data to MySQL using Apache Sqoop
      1. Sqoop export for storing structured data from MySQL 8
      2. Sqoop saved jobs
    6. Summary
  11. Case study: Part II - Real time event processing using MySQL applier
    1. Case study overview
      1. MySQL Applier
      2. SQL Dump and Import
      3. Sqoop
      4. Tungsten replicator
      5. Apache Kafka
      6. Talend
      7. Dell Shareplex
      8. Comparison of Tools
    2. MySQL Applier overview
      1. MySQL Applier installation
        1. libhdfs
        2. cmake
        3. gcc
        4. FindHDFS.cmake
        5. Hive
    3. Real-time integration with MySQL Applier
    4. Organizing and analyzing data in Hadoop
    5. Summary