Streaming Integration

Book description

Data is being generated at an unrelenting pace, and data storage capacity can’t keep up. Enterprises must modernize the way they use and manage data by collecting, processing, and analyzing it in real time—in other words, streaming. This practical report explains everything organizations need to know to begin their streaming integration journey and make the most of their data.

Authors Steve Wilkes and Alok Pareek detail the key attributes and components of an enterprise-grade streaming integration platform, along with stream processing and analysis techniques that will help companies reap immediate value from their data and solve their most pressing business challenges.

  • Learn how to collect and handle large volumes of data at scale
  • See how streams move data between threads, processes, servers, and data centers
  • Get your data in the form you need and analyze it in real time
  • Dive into the pros and cons of data targets such as databases, Hadoop, and cloud services for specific use cases
  • Ensure your streaming integration infrastructure scales, is secure, works 24/7, and can handle failure

Table of contents

  1. Preface
    1. Acknowledgments
  2. Introduction
    1. A Batch of Problems
    2. Under Pressure
    3. Time Value of Data
    4. The Rise of Real-Time Processing
  3. 1. Streaming Integration
    1. Real Time
    2. Continuous Collection
    3. Continuous Movement
    4. Any Enterprise Data
    5. Extreme Volumes
    6. At Scale
    7. High Throughput
    8. Low Latency
    9. Processing
      1. Filtering
      2. Transformation
      3. Aggregation and Change Detection
      4. Enrichment
      5. Implementation Options
    10. Analysis
      1. Time-Series and Statistical Analysis
      2. Event Processing and Pattern Detection
      3. Real-Time Scoring of Machine Learning Algorithms
    11. Correlation
    12. Continuous Delivery
    13. Value
    14. Visibility
    15. Reliable
    16. Verifiable
    17. A Holistic Architecture
  4. 2. Real-Time Continuous Data Collection
    1. Databases and Change Data Capture
      1. CDC Methods
      2. Log-Based CDC Best Suited for Streaming Integration
    2. Files and Logs
      1. Data Collection from Filesystems
    3. Messaging Systems
      1. Data Collection from Messaging Systems
    4. Cloud and APIs
    5. Devices and the IoT
      1. Data Collection from IoT Devices
      2. IoT Scalability Considerations
  5. 3. Streaming Data Pipelines
    1. Moving Data
    2. The Power of Pipelines
    3. Persistent Streams
  6. 4. Stream Processing
    1. In-Memory
    2. Continuous Queries
    3. SQL-Based Processing
      1. Consider the Users
      2. User Interface–Based Processing
    4. Multitemporality
    5. Transformations
    6. Filtering
      1. Filtering for Data Reduction
      2. Filtering for Writing
      3. Analytics
    7. Windows
    8. Enrichment
    9. Distributed Caches
    10. Correlation
  7. 5. Streaming Analytics
    1. Aggregation
    2. Pattern Matching
    3. Statistical Analysis
    4. Integration with Machine Learning
    5. Anomaly Detection and Prediction
  8. 6. Data Delivery and Visualization
    1. Databases and Data Warehouses
    2. Files
    3. Storage Technologies
    4. Messaging Systems
    5. Application Programming Interfaces
    6. Cloud Technologies
    7. Visualization
  9. 7. Mission Criticality
    1. Clustering
    2. Scalability and Performance
    3. Failover and Reliability
    4. Recovery
    5. Exactly-Once Processing
    6. Security
      1. Access Control
      2. Encrypting Data in Flight
      3. Encrypting Data at Rest
  10. 8. Conclusion

Product information

  • Title: Streaming Integration
  • Author(s): Steve Wilkes, Alok Pareek
  • Release date: April 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492045816