Book description
DW 2.0: The Architecture for the Next Generation of Data Warehousing is the first book on the new generation of data warehouse architecture, DW 2.0, by the father of the data warehouse. The book describes the future of data warehousing that is technologically possible today, at both an architectural level and technology level.
The perspective of the book is from the top down: looking at the overall architecture and then delving into the issues underlying the components. This allows people who are building or using a data warehouse to see what lies ahead and determine what new technology to buy, how to plan extensions to the data warehouse, what can be salvaged from the current system, and how to justify the expense at the most practical level. This book gives experienced data warehouse professionals everything they need in order to implement the new generation DW 2.0.
It is designed for professionals in the IT organization, including data architects, DBAs, systems design and development professionals, as well as data warehouse and knowledge management professionals.
- First book on the new generation of data warehouse architecture, DW 2.0
- Written by the "father of the data warehouse", Bill Inmon, a columnist and newsletter editor of The Bill Inmon Channel on the Business Intelligence Network
- Long overdue comprehensive coverage of the implementation of technology and tools that enable the new generation of the DW: metadata, temporal data, ETL, unstructured data, and data quality control
Table of contents
- Cover
- Title Page
- Copyright
- Dedication
- Preface
- Acknowledgments
- About the Authors
- Table of Contents
-
Chapter 1: A brief history of data warehousing and first-generation data warehouses
- Data Base Management Systems
- Online Applications
- Personal Computers and 4GL Technology
- The Spider Web Environment
- Evolution from the Business Perspective
- The Data Ware House Environment
- What Is a Data Warehouse?
- Integrating Data—a Painful Experience
- Volumes of Data
- A Different Development Approach
- Evolution to the DW 2.0 Environment
- The Business Impact of the Data Warehouse
- Various Components of the Data Warehouse Environment
- The Evolution of Data Warehousing from the Business Perspective
- Other Notions About a Data Warehouse
- The Active Data Ware House
- The Federated Data Warehouse Approach
- The Star Schema Approach
- The Data Mart Data Warehouse
- Building a “Real” Data Warehouse
- Summary
-
Chapter 2: An introduction to DW 2.0
- DW 2.0—a New Paradigm
- DW 2.0—from the Business Perspective
- The Life Cycle of Data
- Reasons for the Different Sectors
- Metadata
- Access of Data
- Structured Data/Unstructured Data
- Textual Analytics
- Blather
- The Issue of Terminology
- Specific Text/General Text
- Metadata—a Major Component
- Local Metadata
- A Foundation of Technology
- Changing Business Requirements
- The Flow of Data Within DW 2.0
- Volumes of Data
- Useful Applications
- DW 2.0 and Referential Integrity
- Reporting In DW 2.0
- Summary
- Chapter 3: DW 2.0 components—about the different sectors
-
Chapter 4: Metadata in DW 2.0
- Reusability of Data and Analysis
- Metadata In DW 2.0
- Active Repository/Passive Repository
- The Active Repository
- Enterprise Metadata
- Metadata and the System of Record
- Taxonomy
- Internal Taxonomies/External Taxonomies
- Metadata In the Archival Sector
- Maintaining Metadata
- Using Metadata—an Example
- From the End-User Perspective
- Summary
-
Chapter 5: Fluidity of the DW 2.0 technology infrastructure
- The Technology Infrastructure
- Rapid Business Changes
- The Treadmill of Change
- Getting Off the Treadmill
- Reducing the Length of Time For It to Respond
- Semantically Temporal, Semantically Static Data
- Semantically Temporal Data
- Semantically Stable Data
- Mixing Semantically Stable and Unstable Data
- Separating Semantically Stable and Unstable Data
- Mitigating Business Change
- Creating Snapshots of Data
- A Historical Record
- Dividing Data
- From the End-User Perspective
- Summary
-
Chapter 6: Methodology and approach for DW 2.0
- Spiral Methodology—a Summary of Key Features
- The Seven Streams Approach—an Overview
- Enterprise Reference Model Stream
- Enterprise Knowledge Coordination Stream
- Information Factory Development Stream
- Data Profiling and Mapping Stream
- Data Correction Stream (Previously Called the Data Cleansing Stream)
- Infrastructure Stream
- Total Information Quality Management Stream
- Summary
-
Chapter 7: Statistical processing and DW 2.0
- Two Types of Transactions
- Using Statistical Analysis
- The Integrity of the Comparison
- Heuristic Analysis
- Freezing Data
- Exploration Processing
- The Frequency of Analysis
- The Exploration Facility
- The Sources for Exploration Processing
- Refreshing Exploration Data
- Project-Based Data
- Data Marts and the Exploration Facility
- A Backflow of Data
- Using Exploration Data Internally
- From the Perspective of the Business Analyst
- Summary
-
Chapter 8: Data models and DW 2.0
- An Intellectual Road Map
- The Data Model and Business
- The Scope of Integration
- Making the Distinction Between Granular and Summarized Data
- Levels of the Data Model
- Data Models and the Interactive Sector
- The Corporate Data Model
- A Transformation of Models
- Data Models and Unstructured Data
- From the Perspective of the Business User
- Summary
- Chapter 9: Monitoring the DW 2.0 environment
- Chapter 10: DW 2.0 and security
-
Chapter 11: Time-variant data
- All Data In DW 2.0—Relative To Time
- Time Relativity In the Interactive Sector
- Data Relativity Elsewhere In DW 2.0
- Transactions In the Integrated Sector
- Discrete Data
- Continuous Time Span Data
- A Sequence of Records
- Nonoverlapping Records
- Beginning and Ending a Sequence of Records
- Continuity of Data
- Time-Collapsed Data
- Time Variance In the Archival Sector
- From the Perspective of the End User
- Summary
-
Chapter 12: The flow of data in DW 2.0
- The Flow of Data Throughout the Architecture
- Entering the Interactive Sector
- The Role of ETL
- Data Flow Into the Integrated Sector
- Data Flow Into the Near Line Sector
- Data Flow Into the Archival Sector
- The Falling Probability of Data Access
- Exception-Based Flow of Data
- From the Perspective of the Business User
- Summary
-
Chapter 13: ETL processing and DW 2.0
- Changing States of Data
- Where ETL Fits
- From Application Data to Corporate Data
- ETL In Online Mode
- ETL In Batch Mode
- Source and Target
- An ETL Mapping
- Changing States—an Example
- More Complex Transformations
- ETL and Throughput
- ETL and Metadata
- ETL and An Audit Trail
- ETL and Data Quality
- Creating Etl
- Code Creation or Parametrically Driven ETL
- ETL and Rejects
- Changed Data Capture
- ELT
- From the Perspective of the Business User
- Summary
- Chapter 14: DW 2.0 and the granularity manager
-
Chapter 15: DW 2.0 and performance
- Good Performance—a Cornerstone For DW 2.0
- Online Response Time
- Analytical Response Time
- The Flow of Data
- Queues
- Heuristic Processing
- Analytical Productivity and Response Time
- Many Facets to Performance
- Indexing
- Removing Dormant Data
- End-User Education
- Monitoring the Environment
- Capacity Planning
- Metadata
- Batch Parallelization
- Parallelization for Transaction Processing
- Workload Management
- Data Marts
- Exploration Facilities
- Separation of Transactions Into Classes
- Service Level Agreements
- Protecting the Interactive Sector
- Partitioning Data
- Choosing the Proper Hardware
- Separating Farmers and Explorers
- Physically Group Data Together
- Check Automatically Generated Code
- From the Perspective of the Business User
- Summary
-
Chapter 16: Migration
- Houses and Cities
- Migration In a Perfect World
- The Perfect World Almost Never Happens
- Adding Components Incrementally
- Adding the Archival Sector
- Creating Enterprise Metadata
- Building the Metadata Infrastructure
- “Swallowing” Source Systems
- ETL as a Shock Absorber
- Migration to the Unstructured Environment
- From the Perspective of the Business User
- Summary
-
Chapter 17: Cost justification and DW 2.0
- Is DW 2.0 Worth It?
- Macro-Level Justification
- A Micro-Level Cost Justification
- Company B Has DW 2.0
- Creating New Analysis
- Executing the Steps
- So How Much Does all of This Cost?
- Consider Company B
- Factoring the Cost of DW 2.0
- Reality of Information
- The Real Economics of DW 2.0
- The Time Value of Information
- The Value of Integration
- Historical Information
- First-Generation DW and DW 2.0—The Economics
- From the Perspective of the Business User
- Summary
- Chapter 18: Data quality in DW 2.0
-
Chapter 19: DW 2.0 and unstructured data
- DW 2.0 and Unstructured Data
- Reading Text
- Whereto Do Textual Analytical Processing
- Integrating Text
- Simple Editing
- Stop Words
- Synonym Replacement
- Synonym Concatenation
- Homographic Resolution
- Creating Themes
- External Glossaries/Taxonomies
- Stemming
- Alternate Spellings
- Text Across Languages
- Direct Searches
- Indirect Searches
- Terminology
- Semistructured Data/Value = Name Data
- The Technology Needed to Prepare the Data
- The Relational Data Base
- Structured/Unstructured Linkage
- From the Perspective of the Business User
- Summary
- Chapter 20: DW 2.0 and the system of record
- Chapter 21: Miscellaneous topics
- Chapter 22: Processing in the DW 2.0 environment
-
Chapter 23: Administering the DW 2.0 environment
- The Data Model
- Architectural Administration
- Defining the Moment When an Archival Sector Will Be Needed
- Determining Whether the Near Line Sector Is Needed
- Metadata Administration
- Data Base Administration
- Stewardship
- Systems and Technology Administration
- Management Administration of the DW 2.0 Environment
- Prioritization and Prioritization Conflicts
- Budget
- Scheduling and Determination of Milestones
- Allocation of Resources
- Managing Consultants
- Summary
- Index
- Instructions for online access
Product information
- Title: DW 2.0: The Architecture for the Next Generation of Data Warehousing
- Author(s):
- Release date: July 2010
- Publisher(s): Morgan Kaufmann
- ISBN: 9780080558332
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Python for Data Analysis, 3rd Edition
Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python …
book
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …
book
Generative Deep Learning, 2nd Edition
Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and …