O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advanced Data Management

Book Description

Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions.
This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models.

Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures.

While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Dedication
  5. Preface
  6. Overview
  7. Table of Contents
  8. List of Figures
  9. List of Tables
  10. Part I: Introduction
    1. 1 Background
      1. 1.1 Database Properties
      2. 1.2 Database Components
      3. 1.3 Database Design
        1. 1.3.1 Entity-Relationship Model
        2. 1.3.2 Unified Modeling Language
      4. 1.4 Bibliographic Notes
    2. 2 Relational Database Management Systems
      1. 2.1 Relational Data Model
        1. 2.1.1 Database and Relation Schemas
        2. 2.1.2 Mapping ER Models to Schemas
      2. 2.2 Normalization
      3. 2.3 Referential Integrity
      4. 2.4 Relational Query Languages
      5. 2.5 Concurrency Management
        1. 2.5.1 Transactions
        2. 2.5.2 Concurrency Control
      6. 2.6 Bibliographic Notes
  11. Part II: NOSQL And Non-Relational Databases
    1. 3 New Requirements, “Not only SQL” and the Cloud
      1. 3.1 Weaknesses of the Relational Data Model
        1. 3.1.1 Inadequate Representation of Data
        2. 3.1.2 Semantic Overloading
        3. 3.1.3 Weak Support for Recursion
        4. 3.1.4 Homogeneity
      2. 3.2 Weaknesses of RDBMSs
      3. 3.3 New Data Management Challenges
      4. 3.4 Bibliographic Notes
    2. 4 Graph Databases
      1. 4.1 Graphs and Graph Structures
        1. 4.1.1 A Glimpse on Graph Theory
        2. 4.1.2 Graph Traversal and Graph Problems
      2. 4.2 Graph Data Structures
        1. 4.2.1 Edge List
        2. 4.2.2 Adjacency Matrix
        3. 4.2.3 Incidence Matrix
        4. 4.2.4 Adjacency List
        5. 4.2.5 Incidence List
      3. 4.3 The Property Graph Model
      4. 4.4 Storing Property Graphs in Relational Tables
      5. 4.5 Advanced Graph Models
      6. 4.6 Implementations and Systems
        1. 4.6.1 Apache TinkerPop
        2. 4.6.2 Neo4J
        3. 4.6.3 HyperGraphDB
      7. 4.7 Bibliographic Notes
    3. 5 XML Databases
      1. 5.1 XML Background
        1. 5.1.1 XML Documents
        2. 5.1.2 Document Type Definition (DTD)
        3. 5.1.3 XML Schema Definition (XSD)
        4. 5.1.4 XML Parsers
        5. 5.1.5 Tree Model of XML Documents
        6. 5.1.6 Numbering Schemes
      2. 5.2 XML Query Languages
        1. 5.2.1 XPath
        2. 5.2.2 XQuery
        3. 5.2.3 XSLT
      3. 5.3 Storing XML in Relational Databases
        1. 5.3.1 SQL/XML
        2. 5.3.2 Schema-Based Mapping
        3. 5.3.3 Schemaless Mapping
      4. 5.4 Native XML Storage
        1. 5.4.1 XML Indexes
        2. 5.4.2 Storage Management
        3. 5.4.3 XML Concurrency Control
      5. 5.5 Implementations and Systems
        1. 5.5.1 eXistDB
        2. 5.5.2 BaseX
      6. 5.6 Bibliographic Notes
    4. 6 Key-value Stores and Document Databases
      1. 6.1 Key-Value Storage
        1. 6.1.1 Map-Reduce
      2. 6.2 Document Databases
        1. 6.2.1 Java Script Object Notation
        2. 6.2.2 JSON Schema
        3. 6.2.3 Representational State Transfer
      3. 6.3 Implementations and Systems
        1. 6.3.1 Apache Hadoop MapReduce
        2. 6.3.2 Apache Pig
        3. 6.3.3 Apache Hive
        4. 6.3.4 Apache Sqoop
        5. 6.3.5 Riak
        6. 6.3.6 Redis
        7. 6.3.7 MongoDB
        8. 6.3.8 CouchDB
        9. 6.3.9 Couchbase
      4. 6.4 Bibliographic Notes
    5. 7 Column Stores
      1. 7.1 Column-Wise Storage
        1. 7.1.1 Column Compression
        2. 7.1.2 Null Suppression
      2. 7.2 Column striping
      3. 7.3 Implementations and Systems
        1. 7.3.1 MonetDB
        2. 7.3.2 Apache Parquet
      4. 7.4 Bibliographic Notes
    6. 8 Extensible Record Stores
      1. 8.1 Logical Data Model
      2. 8.2 Physical storage
        1. 8.2.1 Memtables and immutable sorted data files
        2. 8.2.2 File format
        3. 8.2.3 Redo logging
        4. 8.2.4 Compaction
        5. 8.2.5 Bloom filters
      3. 8.3 Implementations and Systems
        1. 8.3.1 Apache Cassandra
        2. 8.3.2 Apache HBase
        3. 8.3.3 Hypertable
        4. 8.3.4 Apache Accumulo
      4. 8.4 Bibliographic Notes
    7. 9 Object Databases
      1. 9.1 Object Orientation
        1. 9.1.1 Object Identifiers
        2. 9.1.2 Normalization for Objects
        3. 9.1.3 Referential Integrity for Objects
        4. 9.1.4 Object-Oriented Standards and Persistence Patterns
      2. 9.2 Object-Relational Mapping
        1. 9.2.1 Mapping Collection Attributes to Relations
        2. 9.2.2 Mapping Reference Attributes to Relations
        3. 9.2.3 Mapping Class Hierarchies to Relations
        4. 9.2.4 Two-Level Storage
      3. 9.3 Object Mapping APIs
        1. 9.3.1 Java Persistence API (JPA)
        2. 9.3.2 Apache Java Data Objects (JDO)
      4. 9.4 Object-Relational Databases
      5. 9.5 Object Databases
        1. 9.5.1 Object Persistence
        2. 9.5.2 Single-Level Storage
        3. 9.5.3 Reference Management
        4. 9.5.4 Pointer Swizzling
      6. 9.6 Implementations and Systems
        1. 9.6.1 DataNucleus
        2. 9.6.2 ZooDB
      7. 9.7 Bibliographic Notes
  12. Part III: Distributed Data Management
    1. 10 Distributed Database Systems
      1. 10.1 Scaling horizontally
      2. 10.2 Distribution Transparency
      3. 10.3 Failures in Distributed Systems
      4. 10.4 Epidemic Protocols and Gossip Communication
        1. 10.4.1 Hash Trees
        2. 10.4.2 Death Certificates
      5. 10.5 Bibliographic Notes
    2. 11 Data Fragmentation
      1. 11.1 Properties and Types of Fragmentation
      2. 11.2 Fragmentation Approaches
        1. 11.2.1 Fragmentation for Relational Tables
        2. 11.2.2 XML Fragmentation
        3. 11.2.3 Graph Partitioning
        4. 11.2.4 Sharding for Key-Based Stores
        5. 11.2.5 Object Fragmentation
      3. 11.3 Data Allocation
        1. 11.3.1 Cost-based allocation
        2. 11.3.2 Consistent Hashing
      4. 11.4 Bibliographic Notes
    3. 12 Replication And Synchronization
      1. 12.1 Replication Models
        1. 12.1.1 Master-Slave Replication
        2. 12.1.2 Multi-Master Replication
        3. 12.1.3 Replication Factor and the Data Replication Problem
        4. 12.1.4 Hinted Handoff and Read Repair
      2. 12.2 Distributed Concurrency Control
        1. 12.2.1 Two-Phase Commit
        2. 12.2.2 Paxos Algorithm
        3. 12.2.3 Multiversion Concurrency Control
      3. 12.3 Ordering of Events and Vector Clocks
        1. 12.3.1 Scalar Clocks
        2. 12.3.2 Concurrency and Clock Properties
        3. 12.3.3 Vector Clocks
        4. 12.3.4 Version Vectors
        5. 12.3.5 Optimizations of Vector Clocks
      4. 12.4 Bibliographic Notes
    4. 13 Consistency
      1. 13.1 Strong Consistency
        1. 13.1.1 Write and Read Quorums
        2. 13.1.2 Snapshot Isolation
      2. 13.2 Weak Consistency
        1. 13.2.1 Data-Centric Consistency Models
        2. 13.2.2 Client-Centric Consistency Models
      3. 13.3 Consistency Trade-offs
      4. 13.4 Bibliographic Notes
  13. Part IV: Conclusion
    1. 14 Further Database Technologies
      1. 14.1 Linked Data and RDF Data Management
      2. 14.2 Data Stream Management
      3. 14.3 Array Databases
      4. 14.4 Geographic Information Systems
      5. 14.5 In-Memory Databases
      6. 14.6 NewSQL Databases
      7. 14.7 Bibliographic Notes
    2. 15 Concluding Remarks
      1. 15.1 Database Reengineering
      2. 15.2 Database Requirements
      3. 15.3 Polyglot Database Architectures
        1. 15.3.1 Polyglot Persistence
        2. 15.3.2 Lambda Architecture
        3. 15.3.3 Multi-Model Databases
      4. 15.4 Implementations and Systems
        1. 15.4.1 Apache Drill
        2. 15.4.2 Apache Druid
        3. 15.4.3 OrientDB
        4. 15.4.4 ArangoDB
      5. 15.5 Bibliographic Notes
  14. Bibliography
  15. Index