Book description
This IBM® Redbooks® publication provides information about aspects of performing infrastructure health checks, such as checking the configuration and verifying the functionality of the common subsystems (nodes or servers, switch fabric, parallel file system, job management, problem areas, and so on).
This IBM Redbooks publication documents how to monitor the overall health check of the cluster infrastructure, to deliver technical computing clients cost-effective, highly scalable, and robust solutions.
This IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost-effective Technical Computing and IBM High Performance Computing (HPC) solutions to optimize business results, product development, and scientific discoveries. This book provides a broad understanding of a new architecture.
Table of contents
- Front cover
- Notices
- Preface
-
Chapter 1. Introduction
- 1.1 Overview of the IBM HPC solution
- 1.2 Why we need a methodical approach for cluster consistency checking
- 1.3 Tools and interpreting their results for HW and SW states
- 1.4 Tools and interpreting their results for identifying performance inconsistencies
- 1.5 Template of diagnostics steps that can be used (checklists)
- Chapter 2. Key concepts and interdependencies
- Chapter 3. The health lifecycle methodology
-
Chapter 4. Cluster components reference model
- 4.1 Overview of installed cluster systems
- 4.2 ClusterA nodes hardware description
- 4.3 ClusterA software description
- 4.4 ClusterB nodes hardware description
- 4.5 ClusterB software description
- 4.6 ClusterC nodes hardware description
- 4.7 ClusterC software description
- 4.8 Interconnect infrastructure
- 4.9 GPFS cluster
- Chapter 5. Toolkits for verifying health (individual diagnostics)
- Appendix A. Commonly used tools
- Appendix B. Tools and commands outside of the toolkit
- Related publications
- Back cover
Product information
- Title: IBM High Performance Computing Cluster Health Check
- Author(s):
- Release date: February 2014
- Publisher(s): IBM Redbooks
- ISBN: None
You might also like
book
Algorithms and Parallel Computing
There is a software gap between the hardware potential and the performance that can be attained …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Python Crash Course, 2nd Edition
This is the second edition of the best selling Python book in the world. Python Crash …
book
Head First Design Patterns, 2nd Edition
You know you don’t want to reinvent the wheel, so you look to design patterns—the lessons …