AI Data Center Network Design and Technologies
by Mahesh Subramaniam, Michal Styszynski, Himanshu Tambakuwala
8
IP Routing for AI/ML Fabrics
The key technologies used in AI data centers today are load-balancing efficiency techniques (such as dynamic load balancing and global load balancing), congestion management Data Center Quantized Congestion Notification (DCQCN) techniques (such as Priority Flow Control DSCP and Explicit Congestion Notification), and RDMA over Converged Ethernet version 2 (RoCEv2). However, the dynamic IP routing protocol is also an essential factor to assess when working on a new network engineering project. Making the IP routing protocol more plug-and-play can have an impact on the convergence, scale, and simplicity of deployments, much as it does in InfiniBand networks. From a topology perspective, most AI data center cluster ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access