HPC, Big Data, and AI Convergence Towards Exascale

Book description

This book provides an updated vision on the most advanced computing, storage, and interconnection technologies, that are at basis of convergence among the HPC, Cloud, Big Data, and AI domains. It gives an insight on challenges faced by integrating such technologies and in achieving performance targets towards the exascale level.

Table of contents

  1. Cover
  2. Half-Title Page
  3. Title Page
  4. Copyright Page
  5. Contents
  6. Foreword by Anders Dam Jensen – EuroHPC
  7. Foreword by Jean-Pierre Panziera – ETP4HPC
  8. Preface by Vít Vondrák – IT4Innovations
  9. Preface by Marco Mezzalama – Links Foundation
  10. Acknowledgments
  11. Editors
  12. Contributors
  13. 1 Toward the Convergence of High-Performance Computing, Cloud, and Big Data Domains
    1. 1.1 Introduction
      1. 1.1.1 History of Cloud Computing
      2. 1.1.2 History of HPC
      3. 1.1.3 Evolution of Big Data
      4. 1.1.4 Evolution of Big Data Storage and Tools
    2. 1.2 Exploiting Convergence
      1. 1.2.1 CYBELE Project
      2. 1.2.2 DeepHealth Project
      3. 1.2.3 EVOLVE Project
      4. 1.2.4 LEXIS Project
    3. Acknowledgment
    4. References
  14. 2 The LEXIS Platform for Distributed Workflow Execution and Data Management
    1. 2.1 Motivation
    2. 2.2 Architecture (Codesign) and Interfaces
    3. 2.3 Security
    4. 2.4 Accounting and Billing
    5. 2.5 Easy Access to HPC/Cloud through a Specialized Web Portal
    6. 2.6 Market Analysis
      1. 2.6.1 LEXIS Project Impact
    7. Acknowledgment
    8. References
  15. 3 Enabling the HPC and Artificial Intelligence Cross-Stack Convergence at the Exascale Level
    1. 3.1 Introduction
    2. 3.2 The Rise of Convergent Infrastructures
    3. 3.3 The ACROSS Approach to the HPC, Big Data, and AI Convergence
      1. 3.3.1 Heterogeneous Infrastructural Support
      2. 3.3.2 The Management of the Convergent Platform
    4. 3.4 Related Works
    5. 3.5 Conclusions
    6. Acknowledgment
    7. Notes
    8. Bibliography
  16. 4 Data System and Data Management in a Federation of HPC/Cloud Centers
    1. 4.1 Introduction: Data Federation of European HPC/Cloud Centers
    2. 4.2 Requirements on the LEXIS DDI
      1. 4.2.1 Unified Data Access
      2. 4.2.2 Usage and Federation of Diverse Data Backend Systems
      3. 4.2.3 Reliability and Redundancy
      4. 4.2.4 AAI Support
      5. 4.2.5 APIs
      6. 4.2.6 State-of-the-art Research Data Management
    3. 4.3 Federation via a DDI Based on iRODS
      1. 4.3.1 Relevant Basic Properties of iRODS
      2. 4.3.2 iRODS HA Setup
      3. 4.3.3 iRODS Zones Federation across Centers and Data Movement
      4. 4.3.4 Storage Tiering and Underlying Data Storage
      5. 4.3.5 Logical Structure of the DDI
    4. 4.4 Hardware
      1. 4.4.1 Storage Systems for HPC and Infrastructure-as-a-Service- Cloud Clusters
      2. 4.4.2 Storage Systems Dedicated to LEXIS
      3. 4.4.3 HPC–Cloud-Storage Interconnect and Data Node/Burst Buffer Concept
        1. 4.4.3.1 SBF (Smart Bunch of Flash)
        2. 4.4.3.2 SBB
    5. 4.5 Unified Access to the Platform Based on an AAI
      1. 4.5.1 LEXIS Identity and Access Management (IAM) Solution, SSO, and AAI
      2. 4.5.2 Platform Services vs. AAI: Separation of Concerns
      3. 4.5.3 LEXIS DDI and IAM/AAI System
    6. 4.6 Data Management via APIs
      1. 4.6.1 Data Search, Upload, and Download APIs
      2. 4.6.2 Staging API
      3. 4.6.3 Replication and PID Assignment API
      4. 4.6.4 Helper APIs
      5. 4.6.5 Compression/Decompression/Encryption/Decryption API
    7. 4.7 Integration with EUDAT Services
      1. 4.7.1 EUDAT B2HANDLE
      2. 4.7.2 EUDAT B2SAFE
      3. 4.7.3 EUDAT B2STAGE
    8. 4.8 Conclusion
    9. Acknowledgment
    10. References
  17. 5 Distributed HPC Resources Orchestration for Supporting Large-Scale Workflow Execution
    1. 5.1 Introduction
    2. 5.2 Federated Execution Platforms
    3. 5.3 WMSs and Implementation in LEXIS
      1. 5.3.1 Dynamic Workflow Orchestration
      2. 5.3.2 Resource Management Metrics
    4. 5.4 Workflow Data Management
    5. 5.5 LEXIS Pilot Use Cases and Orchestration
    6. 5.6 Related Works
    7. 5.7 Conclusion
    8. Acknowledgment
    9. Notes
    10. Bibliography
  18. 6 Advanced Engineering Platform Supporting CFD Simulations of Aeronautical Engine Critical Parts
    1. 6.1 Introduction: Background and LEXIS Aeronautics Pilot
    2. 6.2 Engineering Case Studies in the LEXIS Aeronautics Pilot
    3. 6.3 The Turbomachinery Case Study
      1. 6.3.1 Engineering Context
      2. 6.3.2 Digital Technology Deployment
        1. 6.3.2.1 Application Workflow
        2. 6.3.2.2 Main Application Software and HW Resources
      3. 6.3.3 First Results
      4. 6.3.4 Benefit–Cost Analysis of HW Acceleration
      5. 6.3.5 Next Steps
    4. 6.4 The Rotating Parts Case Study
      1. 6.4.1 Engineering Context
      2. 6.4.2 Digital Technology Deployment
        1. 6.4.2.1 Application Workflow
        2. 6.4.2.2 Main Application Software and HW Resources
      3. 6.4.3 First Results
        1. 6.4.3.1 SPH Liquid-Phase Simulation
        2. 6.4.3.2 SPH Gas-Phase Simulation
      4. 6.4.4 Next Steps
    5. 6.5 Final remarks
    6. Acknowledgment
    7. Notes
    8. References
  19. 7 Event-Driven, Time-Constrained Workflows: An Earthquake and Tsunami Pilot
    1. 7.1 Introduction
    2. 7.2 Event-Driven, Time-Constrained Workflows
      1. 7.2.1 Requirements
      2. 7.2.2 Background
      3. 7.2.3 Overall View of the Workflow
    3. 7.3 Workflow Components
      1. 7.3.1 Shakemap and Exposure Dataset
      2. 7.3.2 Tsunami Simulations
      3. 7.3.3 SEM
    4. 7.4 Technological Layers
      1. 7.4.1 Technology Layer 1: Orchestration
      2. 7.4.2 Technology Layer 2: Heterogeneous Compute
      3. 7.4.3 Technology Layer 3: Data
    5. 7.5 Conclusion
    6. Acknowledgment
    7. Note
    8. References
  20. 8 Exploitation of Multiple Model Layers within LEXIS Weather and Climate Pilot: An HPC-Based Approach
    1. 8.1 Introduction: Background and Driving Forces
    2. 8.2 The Weather and Climate Pilot
    3. 8.3 Observational Data
    4. 8.4 LEXIS DDI and Weather and Climate Data API
    5. 8.5 LEXIS Orchestration System
    6. 8.6 Weather and Climate Pilot Workflows
      1. 8.6.1 WRF–ERDS Workflow Examples
    7. 8.7 Conclusion
    8. Acknowledgment
    9. References
  21. 9 Data Convergence for High-Performance Cloud
    1. 9.1 Introduction
    2. 9.2 Motivations
    3. 9.3 Design and Implementation
    4. 9.4 Karvdash
    5. 9.5 DataShim
      1. 9.5.1 Overview
      2. 9.5.2 Dataset Custom Resource Definition
      3. 9.5.3 DatasetInternal Custom Resource Definition
      4. 9.5.4 DataShim Operator and Admission Controller
      5. 9.5.5 Caching Plugin
      6. 9.5.6 Objects Caching on CEPH
      7. 9.5.7 Ceph-Based Caching Plugin Implementation
      8. 9.5.8 Evaluation of the Ceph-Based Caching Plugin
    6. 9.6 H3
      1. 9.6.1 Overview
      2. 9.6.2 Data and Metadata Organization
      3. 9.6.3 The H3 Ecosystem
    7. 9.7 Integration
    8. 9.8 Related Work
    9. 9.9 Conclusions
    10. Note
    11. References
  22. 10 The DeepHealth HPC Infrastructure: Leveraging Heterogenous HPC and Cloud-Computing Infrastructures for IA-Based Medical Solutions
    1. 10.1 Introduction
    2. 10.2 The Parallel Execution of EDDL Operations
      1. 10.2.1 COMPSs
      2. 10.2.2 StreamFlow
    3. 10.3 Cloud Infrastructures
      1. 10.3.1 Hybrid Cloud
      2. 10.3.2 Parallel Execution on Cloud Environments
        1. 10.3.2.1 Parallel Cloud Execution Based on COMPSs
        2. 10.3.2.2 Parallel Cloud Execution Based on StreamFlow
    4. 10.4 Acceleration Devices: GPU and FPGAs
      1. 10.4.1 FPGA Acceleration
        1. 10.4.1.1 The DeepHealth FPGA Infrastructure
        2. 10.4.1.2 An Optimized FPGA Board Design for DL
        3. 10.4.1.3 FPGA-Based Algorithms
      2. 10.4.2 Many Core and GPU Acceleration
    5. 10.5 Conclusions
    6. Notes
  23. 11 Applications of AI and HPC in the Health Domain
    1. 11.1 Introduction
    2. 11.2 AI and HPC in the Health Domain in 2020
    3. 11.3 DeepHealth Concept
    4. 11.4 DeepHealth Use Cases
    5. 11.5 Use of HPC and Cloud in Medical Pilots
      1. 11.5.1 UC2 – UNITOPatho
      2. 11.5.2 UC3 – UNITOBrain
      3. 11.5.3 UC4 – Chest
      4. 11.5.4 UC5 – UNITO Deep Image Annotation
      5. 11.5.5 UC12 –Skin Cancer Melanoma Detection
    6. 11.6 DeepHealth Value Proposition
    7. 11.7 Conclusions
    8. Notes
  24. 12 CYBELE: On the Convergence of HPC, Big Data Services, and AI Technologies
    1. 12.1 Introduction: Background and Driving Forces
    2. 12.2 Identified Gaps: Motivating the CYBELE Vision
    3. 12.3 Materializing the Solution: Convergence of HPC, Big Data, and AI
      1. 12.3.1 Data and Infrastructure Access Security Layer
      2. 12.3.2 Embedded Experiments Composition Layer
      3. 12.3.3 Parallel and Distributed Execution Management Layer
      4. 12.3.4 Data Services Layer
      5. 12.3.5 Visualization and Reporting Layer
    4. 12.4 Key Takeaways and Conclusions
    5. Note
    6. References
  25. 13 CYBELE:: A Hybrid Architecture of HPC and Big Data for AI Applications in Agriculture
    1. 13.1 Introduction: Vision and Challenges
    2. 13.2 Background
      1. 13.2.1 AI in Big Data Analytics on Cloud
      2. 13.2.2 AI on HPC Systems
    3. 13.3 Hybrid Big Data and HPC Resource for AI Applications in CYBELE
    4. 13.4 Parallelization and Deployment of AI Applications on HPC Systems
      1. 13.4.1 Pilot Soybean Farming
        1. 13.4.1.1 Pilot Description
        2. 13.4.1.2 Application Parallelization for HPC Systems
      2. 13.4.2 Pilot Wheat Ear
        1. 13.4.2.1 Pilot Description
        2. 13.4.2.2 Application Parallelization for HPC Systems
    5. 13.5 Performance Evaluation for Pilot Soybean Farming and Pilot Wheat Ear
    6. 13.6 Discussion
    7. 13.7 Conclusion Remarks and Future Works
    8. Acknowledgments
    9. Notes
    10. References
  26. 14 European Processor Initiative: Europe’s Approach to Exascale Computing
    1. 14.1 Introduction
    2. 14.2 European Processor Initiative
      1. 14.2.1 Global Technical Panstream
      2. 14.2.2 GPP Stream
      3. 14.2.3 Accelerator Stream
      4. 14.2.4 Automotive Stream
    3. 14.3 Conclusion
    4. Acknowledgment
    5. Bibliography
  27. Index

Product information

  • Title: HPC, Big Data, and AI Convergence Towards Exascale
  • Author(s): Olivier Terzo, Jan Martinovič
  • Release date: January 2022
  • Publisher(s): CRC Press
  • ISBN: 9781000485172