book

Building Data Integration Solutions

Name: Building Data Integration Solutions
Author: Jay Borthen
ISBN: 9781098173067

by Jay Borthen

October 2025

Intermediate to advanced

284 pages

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Overview of the Book Structure and What Readers Can Expect to LearnConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Foundations of Data Integration
1. Introduction to Data Integration
Data Integration and Data ManagementDefining Data IntegrationWhy Data Integration Is ImportantThe Evolution of Data IntegrationData Integration Use Cases and Case StudiesHealthcareTax AdministrationImmigration and Border ControlConclusion
2. Key Concepts in Data Integration
Data PropertiesData TypesData Structure TypesMetadataData OrientationEncodingsFile FormatsData ContextData StoresTypes of StorageData Models and Management SystemsHybrid and Multicloud StorageData Movement and TransformationConnectors and ConnectionsMigrationIngestionReplicationBatches, Streams, and EventsPipelinesConditioningChange Data CaptureIntegration ManagementData ServicesData OrchestrationConclusion
3. Data Integration Challenges
Organizational IssuesTechnical ChallengesData QualityData ProcessingSecurity and ComplianceConclusion
4. Models, Architectures, Methods, and Patterns
ModelsConceptual Data Integration ModelsLogical Data Integration ModelsPhysical Data Integration MethodsArchitecturesHub-and-SpokePoint-to-PointEnterprise Service BusFederationMethodsPatternsIngestion PatternsData Consolidation PatternData Replication and Propagation PatternData Virtualization PatternEvent-Driven Integration PatternConclusion
II. Tools, Technologies, and Frameworks
5. Data Integration Tool Options
Open Source Versus Commercial SolutionsAdvantages of Open Source SolutionsAdvantages of Commercial SolutionsProgramming Languages Versus Low-Code/No-Code PlatformsCloud Versus On-Premises ArchitecturesOn-Premises ConsiderationsCloud Service ProvidersDistributed Versus Centralized Data SystemsIn-Memory ProcessingSecurity and ComplianceConclusion
6. Data Stores and Management Systems
Relational DatabasesIBM Db2Microsoft SQL ServerMySQL and MariaDBOracle DatabasePostgreSQLSQLiteSybase and SAPNon-Relational DatabasesDocument Stores and Key-Value StorageGraph DatabasesVector DatabasesWide-Column DatabasesData WarehousesAmazon RedshiftApache Doris, Druid, Hadoop, and HiveCloudera Data WarehouseIBM Db2 WarehouseSnowflakeData Lakes and LakehousesAmazon Simple Storage ServiceApache Hudi and IcebergAzure Blob StorageDelta LakeGoogle Cloud StorageIBM Cloud Storage ServicesConclusion
7. Data Ingestion and Streaming Tools
Apache Beam, Flink, Spark, and StormApache NiFiAWS Glue and Amazon KinesisAzure Event HubsConfluent and KafkaConclusion

8. Comprehensive Integration Suites
AWS Glue, Amazon Elastic MapReduce, and Amazon QAzure Data FactoryDatabricksFivetranIBM DataStage and App ConnectIICS and PowerCenterMicrosoft SQL Server Integration ServicesMuleSoftOracle Data Integrator and GoldenGatePentahoQlik, Talend, and StitchTIBCOConclusion
III. Introducing the Example Data Integration Solution
9. Introducing the Example Solution
ObjectivesInitial StatePlanned ArchitectureConclusion
10. Implementing a Batch Solution
Setting Up Qlik ReplicateSetting Up a Windows Server EC2 Instance for Qlik ReplicateInstalling and Downloading Qlik ReplicateSetting Up Endpoint ConnectionsSetting Up DatabricksSetting Up Databricks in AWSConnecting DatabricksConclusion
11. Implementing a Streaming Solution
Raspberry Pi and Sensor SetupBill of MaterialsSensor ConfigurationCreating a Confluent Cloud ClusterCreating a Local Python EnvironmentCluster SettingsCreating a TopicConfiguring a ClientCreating the Python Producer and Consumer ApplicationsSetting Up a ConnectorConclusion
A. Setting Up the Data Integration Solution Example
Ubuntu Linux and PostgreSQLSetting Up Ubuntu AWS EC2 InstancesCreating a PostgreSQL Database in the Rocky Linux AWS EC2 InstanceA Final Thought on Sensor Devices
B. References
Key Terms Glossary
Acronyms Glossary
Index
About the Author

Content preview from Building Data Integration Solutions

Chapter 6. Data Stores and Management Systems

Data storage does not necessarily need to impose structure on data. For example, a filesystem may store data as raw files with no inherent organization other than folder structure. However, throughout the majority of this book, data storage is equated with databases. And as discussed in Part I, there are two primary types of contemporary databases: relational databases and non-relational databases. For structured data that can be organized into rows and columns, relational databases are the obvious choice because of their ubiquity and efficiency. The relational databases described in this chapter include IBM Db2, Microsoft SQL Server, MySQL and MariaDB, Oracle Database, PostgreSQL, SQLite, and Sybase and SAP.

For unstructured data, such as images, videos, audio files, and text files, non-relational databases are more appropriate than relational databases. We’ll go over multiple kinds of non-relational databases in this chapter, including document stores and key-value storage, graph databases, vector databases, and wide-column stores. The specific non-relational databases covered in this chapter include Amazon DynamoDB and DocumentDB, Apache Ignite, MongoDB, Redis, Amazon Neptune, Neo4j, TigerGraph, Pinecone, Apache Cassandra and HBase, AWS Keyspaces, and Google Bigtable.

Data warehouses, data lakes, and data lakehouses are also discussed in this chapter and include product offerings such as Amazon Redshift, Apache Doris, Apache Druid, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

The Definitive Guide to Data Integration

Publisher Resources

ISBN: 9781098173050Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Building Data Integration Solutions

by Jay Borthen

Chapter 6. Data Stores and Management Systems

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.