Book description
Entity Information Life Cycle for Big Data walks you through the ins and outs of managing entity information so you can successfully achieve master data management (MDM) in the era of big data. This book explains big data’s impact on MDM and the critical role of entity information management system (EIMS) in successful MDM. Expert authors Dr. John R. Talburt and Dr. Yinle Zhou provide a thorough background in the principles of managing the entity information life cycle and provide practical tips and techniques for implementing an EIMS, strategies for exploiting distributed processing to handle big data for EIMS, and examples from real applications. Additional material on the theory of EIIM and methods for assessing and evaluating EIMS performance also make this book appropriate for use as a textbook in courses on entity and identity management, data management, customer relationship management (CRM), and related topics.
- Explains the business value and impact of entity information management system (EIMS) and directly addresses the problem of EIMS design and operation, a critical issue organizations face when implementing MDM systems
- Offers practical guidance to help you design and build an EIM system that will successfully handle big data
- Details how to measure and evaluate entity integrity in MDM systems and explains the principles and processes that comprise EIM
- Provides an understanding of features and functions an EIM system should have that will assist in evaluating commercial EIM systems
- Includes chapter review questions, exercises, tips, and free downloads of demonstrations that use the OYSTER open source EIM system
- Executable code (Java .jar files), control scripts, and synthetic input data illustrate various aspects of CSRUD life cycle such as identity capture, identity update, and assertions
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Foreword
- Preface
- Acknowledgements
- Chapter 1. The Value Proposition for MDM and Big Data
- Chapter 2. Entity Identity Information and the CSRUD Life Cycle Model
- Chapter 3. A Deep Dive into the Capture Phase
- Chapter 4. Store and Share – Entity Identity Structures
- Chapter 5. Update and Dispose Phases – Ongoing Data Stewardship
- Chapter 6. Resolve and Retrieve Phase – Identity Resolution
- Chapter 7. Theoretical Foundations
- Chapter 8. The Nuts and Bolts of Entity Resolution
- Chapter 9. Blocking
-
Chapter 10. CSRUD for Big Data
- Large-Scale ER for MDM
- The Transitive Closure Problem
- Distributed, Multiple-Index, Record-Based Resolution
- An Iterative, Nonrecursive Algorithm for Transitive Closure
- Iteration Phase: Successive Closure by Reference Identifier
- Deduplication Phase: Final Output of Components
- ER Using the Null Rule
- The Capture Phase and IKB
- The Identity Update Problem
- Persistent Entity Identifiers
- The Large Component and Big Entity Problems
- Identity Capture and Update for Attribute-Based Resolution
- Concluding Remarks
- Chapter 11. ISO Data Quality Standards for Master Data
- Appendix A. Some Commonly Used ER Comparators
- References
- Index
Product information
- Title: Entity Information Life Cycle for Big Data
- Author(s):
- Release date: April 2015
- Publisher(s): Morgan Kaufmann
- ISBN: 9780128006658
You might also like
book
Beyond Big Data: Using Social MDM to Drive Deep Customer Insight
Drive Powerful Business Value by Extending MDM to Social, Mobile, Local, and Transactional Data Enterprises have …
book
Modelling Business Information
This is an essential guide to entity relationship and class modelling for business analysts in line …
book
A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R
The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in …
article
Relational Power Is the New Currency of Hybrid Work
The growing use of text-based messaging platforms such as Slack and Huddle means that many manager-employee …