AI Data Center Network Design and Technologies
by Mahesh Subramaniam, Michal Styszynski, Himanshu Tambakuwala
Overview
Artificial intelligence is redefining the scale, architecture, and performance expectations of modern data centers. Training large ML models demand infrastructure capable of moving massive data sets through highly parallel, compute-intensive environmentswhere traditional data center designs simply cant keep up.
AI Data Center Network Design and Technologies is the first comprehensive, vendor-agnostic guide to the design principles, architectures, and technologies that power AI training and inference clusters. Written by leading experts in AI Data center design, this book helps engineers, architects, and technology leaders understand how to design and scale networks purpose-built for the AI era.
INSIDE, YOULL LEARN HOW TO
Architect scalable, high-radix network fabrics to support xPU (GPE, TPU)-based AI clusters
Integrate lossless Ethernet/IP fabrics for high-throughput, low-latency data movement
Align network design with AI/ML workload characteristics and server architectures
Address challenges in cooling, power, and interconnect design for AI-scale computing
Evaluate emerging technologies from the Ultra Ethernet Consortium (UEC) and their affect on future AI data centers
Apply best practices for deployment, validation, and performance measurement in AI/ML environments
With broad coverage of both foundational concepts and emerging innovations, this book bridges the gap between network engineering and AI infrastructure design. It empowers readers to understand not only how AI data centers workbut why they must evolve.
.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access