O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Cloud Native Data Center Networking

Book Description

If you want to study, build, or simply validate your thinking about modern cloud native data center networks, this is your book. Whether you’re pursuing a multitenant private cloud, a network for running machine learning, or an enterprise data center, author Dinesh Dutt takes you through the steps necessary to design a data center that’s affordable, high capacity, easy to manage, agile, and reliable.

Ideal for network architects, data center operators, and network and containerized application developers, this book mixes theory with practice to guide you through the architecture and protocols you need to create and operate a robust, scalable network infrastructure. The book offers a vendor-neutral way to look at network design. For those interested in open networking, this book is chock-full of examples using open source software, from FRR to Ansible.

In the context of a cloud native data center, you’ll examine:

  • Clos topology
  • Network disaggregation
  • Network operating system choices
  • Routing protocol choices
  • Container networking
  • Network virtualization and EVPN
  • Network automation

Table of Contents

  1. Preface
    1. Audience
    2. How This Book Is Organized
    3. Software Used in This Book
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
  2. 1. The Motivations for a New Network Architecture
    1. The Application-Network Shuffle
    2. The Network Design from the Turn of the Century
      1. The Charms of Bridging
      2. Building Scalable Bridging Networks
    3. The Trouble with the Access-Aggregation-Core Network Design
      1. Unscalability
      2. Complexity
      3. Failure Domain
      4. Unpredictability
      5. Inflexibility
      6. Lack of Agility
    4. The Stories Not Told
    5. Summary
  3. 2. Clos: Network Topology for a New World
    1. Introducing the Clos Topology
    2. A Deeper Dive into the Clos Topology
      1. Use of Homogeneous Equipment
      2. Routing as the Fundamental Interconnect Model
      3. Oversubscription in a Clos Topology
      4. Interconnect Link Speeds
      5. Practical Constraints
      6. Fine-Grained Failure Domain
    3. Scaling the Clos Topology
    4. Comparing the Two Three-Tier Models
      1. Application Matchup
      2. Data Center Build Out
    5. Implications of the Clos Topology
      1. Rethinking Failures and Troubleshooting
      2. Cabling
      3. Simplified Inventory Management
      4. Network Automation
    6. Some Best Practices for a Clos Network
      1. Use of Multiple Links Between Switches
      2. Use of Spines as Only a Connector
      3. Use of Chassis as a Spine Switch
    7. Host Attach Models
    8. Summary
    9. References
  4. 3. Network Disaggregation
    1. What Is Network Disaggregation?
    2. Why Is Network Disaggregation Important?
      1. Controlling Costs
      2. Avoiding Vendor Lock-In
      3. Standardization of Features
    3. What Made Network Disaggregation Possible Now?
    4. Difference in Network Operations with Disaggregation
      1. Purchase and Support
      2. First Boot
    5. Open Network Installer Environment
      1. How Does ONIE Work?
    6. The Players in Network Disaggregation: Hardware
      1. Packet-Switching Silicon
      2. ODMs
      3. CPU Complex
      4. The Standards Bodies
    7. Common Myths About Network Disaggregation
    8. Some Best Practices for Engaging with Network Disaggregation
    9. Summary
    10. References
  5. 4. Network Operating System Choices
    1. Requirements of a Network Device
    2. The Rise of Software-Defined Networking and OpenFlow
      1. More Details About SDN and OpenFlow
      2. The Trouble with OpenFlow
      3. OVS
      4. The Effect of SDN and OpenFlow on Network Disaggregation
    3. NOS Design Models
      1. Location of Switch Network State
      2. Programming the Switching Silicon
      3. API
      4. The Reasons Behind the Different Answers
    4. User Interface
    5. Comparing the NOS Models with Cloud Native NOS Requirements
      1. Illustrating the Models with an Example
    6. What Else Is Left for a NOS to Do?
    7. Summary
    8. References
  6. 5. Routing Protocol Choices
    1. Routing Overview
      1. How Routing Table Lookups Work
      2. How Routes Are Chosen
      3. Types of Routing Table Entries
      4. RIB and FIB
    2. Routing Protocols Overview
    3. Distance Vector Protocols Versus Link-State Protocols
      1. Distance Vector Dissected
      2. Link-State Dissected
      3. Summarizing Distance Vector Versus Link-State Route Exchange
    4. Comparing Distance Vector and Link-State Protocols
      1. Scaling in Link-State and Distance Vector Protocols
      2. Multipathing in Distance Vector and Link-State Protocols
      3. No News Is Good News
      4. Propagation Delay in Link-State and Distance Vector Protocols
      5. Multiprotocol Support
      6. Unnumbered Interfaces
      7. Routing Configuration Complexity
    5. Routing Protocols in Clos Networks
      1. Link-State Versus Distance Vector When Links or Nodes Fail
      2. Route Summarization in Clos Networks
      3. Security and Safeguards
    6. Bidirectional Forwarding Detection
    7. Requirements of a Routing Protocol in the Data Center
      1. Basic Requirements
      2. Advanced Requirements
      3. Rare or Futuristic Requirements
    8. Choosing the Routing Protocol for Your Network
    9. Summary
    10. References
  7. 6. Network Virtualization
    1. What Is Network Virtualization?
    2. Uses of Network Virtualization in the Data Center
      1. Forcing Traffic to Take a Certain Path
      2. Applications That Require L2 Adjacency
      3. Cloud
    3. Separating Switch Management Network from Data Traffic
    4. Network Virtualization Models
      1. Service Abstraction: L2 or L3
      2. Inline Versus Overlay Virtual Networks
    5. Network Tunnels: The Fundamental Overlay Construct
      1. Benefits of Network Tunnels
      2. The Drawbacks of Network Tunnels
    6. Network Virtualization Solutions for the Data Center
      1. VLAN
      2. VRF
      3. VXLAN
      4. Other Network Virtualization Solutions
    7. Practical Limits on the Number of Virtual Networks
      1. Size of Virtual Network ID in Packet Header
      2. Hardware Limitations
      3. Scalability of Control Plane and Software
      4. Deployment Model
    8. Control Protocols for Network Virtualization
      1. Relationship of Virtual and Physical Control Plane
      2. The Centralized Control Model
      3. The Protocol-Based Control Model
    9. Vendor Support for Network Virtualization
      1. Merchant Silicon
      2. Software
      3. Standards
    10. Illustrating VXLAN Bridging and Routing
      1. VXLAN Bridging Example: H1 to H5
      2. VXLAN and Routing: H1 to H6
      3. Summarizing VXLAN Bridging and Routing
    11. Summary
  8. 7. Container Networking
    1. Introduction to Containers
    2. Namespaces
      1. Network Namespaces
    3. Virtual Ethernet Interfaces
    4. Container Networking: Diving In
      1. Single-Host Container Networking
      2. Multihost Container Networking
    5. Comparing Different Container Network Solutions
    6. Kubernetes Networking
    7. Summary
  9. 8. Multicast Routing
    1. Multicast Routing: Overview
      1. The Uses of Multicast Routing
    2. Problems to Solve in Multicast Routing
      1. Building a Multicast Tree
      2. Multicast Routing Protocol
    3. PIM Sparse Mode
      1. Rendezvous Point
      2. Building a Multicast Distribution Tree
      3. Multiple RPs and MSDP
    4. PIM-SM in the Data Center
      1. PIM-SM and Unnumbered
    5. Summary
  10. 9. Life on the Edge of the Data Center
    1. The Problems
    2. Connectivity Models
      1. Why Connect to the External World?
      2. Bandwidth Requirements for External Connectivity
      3. Connecting the Clos Topology to the External World
      4. Routing at the Edge
      5. Services
    3. Hybrid Cloud Connectivity
    4. Summary
  11. 10. Network Automation
    1. What Is Network Automation?
    2. Who Needs Network Automation?
    3. Does Network Automation Mean Learning Programming?
    4. Why Is Network Automation Difficult?
      1. The Trouble with IP Addresses and Interfaces
      2. Scale
      3. Network Protocol Configuration Complexity
      4. Lack of Programmatic Access
      5. Traditional Network OS Limitations
    5. What Can Network Developers Do to Help Network Automation?
    6. Tools for Network Automation
    7. Automation Best Practices
    8. Ansible: An Overview
      1. Inventory
      2. Playbooks
      3. Ad Hoc Commands
      4. Structuring Playbooks
    9. A Typical Automation Journey
      1. Glorified File Copy
      2. Automate the Configuration That Was Not Device Specific
      3. Template the Routing and Interface Configuration
      4. More Templating and Roles
      5. Some Observations from Fellow Journeymen
    10. Validating the Configuration
      1. Single Source of Truth
      2. Commit/Rollback in the Age of Automation
      3. Vagrant and Network Testing
      4. Automating Verification
    11. Summary
    12. References
  12. 11. Network Observability
    1. What Is Observability?
    2. The Current State of Network Observability
      1. The Disenchantments of SNMP
      2. Box-by-Box Approach to Network Observability
    3. Why Is Observability Difficult with Networking?
    4. Observability in Data Center Networks: Special Characteristics
    5. Decomposing Observability
    6. The Mechanics of Telemetry
      1. What Do We Gather?
      2. How Do We Gather?
      3. When Do We Gather?
      4. Storing the Data
    7. The Uses for Multiple Data Sources
    8. Of Alerts and Dashboards
    9. Summary
    10. References
  13. 12. Rethinking Network Design
    1. Standard, Simple Building Blocks
      1. Network Disaggregation
    2. Failure: Missing the Forest for the Trees
      1. L2 Failure Model Versus L3 Failure Model
      2. Simple Versus Complex Failures
      3. Handling Upgrades
    3. The Pursuit of Less
      1. How the Right Architecture Helps
      2. Feature Set Essentialism
    4. Constraints on the Cloud Native Network Design Principles
    5. Summary
  14. 13. Deploying OSPF
    1. Why OSPF?
    2. The Problems to Be Addressed
      1. Determining Link-State Flooding Domains
      2. Numbered Versus Unnumbered OSPF
      3. Support for IPv6
      4. Support for VRFs
      5. Requirements for Running OSPF on Servers
    3. OSPF Route Types
      1. The Messiness of Stubbiness
    4. OSPF Timers
    5. Dissecting an OSPF Configuration
      1. Configuration for Leaf-Spine in a Two-Tier Clos Topology: IPv4
      2. Configuration for Leaf-Spine in a Two-Tier Clos Topology: IPv6
      3. Configuration with Three-Tier Clos Running OSPF
      4. Configuration with Servers Running OSPF: IPv4
      5. Summarizing Routes in OSPF
      6. OSPF and Upgrades
    6. Best Practices
    7. Summary
  15. 14. BGP in the Data Center
    1. Basic BGP Concepts
      1. BGP Protocol Overview
      2. BGP Peering
      3. BGP State Machine
      4. Autonomous System Number
      5. BGP Capabilities
      6. BGP Attributes, Communities, Extended Communities
      7. BGP Best-Path Computation
      8. Support for Multiple Protocols
      9. BGP Messages
    2. Adapting BGP to the Data Center
      1. eBGP Versus iBGP
      2. eBGP: Flying Solo
      3. Private ASNs
      4. BGP’s ASN Numbering Scheme
      5. Multipath Selection
      6. Fixing BGP’s Convergence Time
    3. Summary
  16. 15. Deploying BGP
    1. Core BGP Configuration Concepts
    2. Traditional Configuration for a Two-Tier Clos Topology: IPv4
    3. Peer Group
    4. Routing Policy
      1. Route Maps: Implementation of Routing Policy
    5. Providing Sane Defaults for the Data Center
    6. BGP Unnumbered: Eliminating Pesky Interface IP Addresses
      1. A remote-as by Any Name
      2. How Unnumbered Interfaces Work with BGP
      3. Final Observations on BGP Configuration in FRR
      4. Unnumbered BGP Support in Routing Stacks
      5. Summary
    7. Configuring IPv6
    8. BGP and VRFs
    9. Peering with BGP Speakers on the Host
      1. BGP Dynamic Neighbors
    10. BGP and Upgrades
      1. AS_PATH Prepend
      2. GRACEFUL_SHUTDOWN Community
      3. Max-MED
    11. Best Practices
    12. Summary
  17. 16. EVPN in the Data Center
    1. Why Is EVPN Popular?
    2. The Problems a Network Virtualization Control Plane Must Address
    3. Where Does a VTEP Reside?
    4. One Protocol to Rule Them All, Or…?
      1. iBGP Characteristics
      2. Separate Underlay and Overlay Protocols
      3. eBGP Only
    5. BGP Constructs to Support Virtual Network Routes
      1. Route Distinguisher
      2. Route Target
      3. FRR’s use of RD and RT
      4. EVPN Route Types
      5. Communicating Choice of BUM Handling
    6. EVPN and Bridging
      1. EVPN Bridging with Ingress Replication
      2. EVPN Bridging with Routed Multicast Underlay
      3. Handling MAC Moves
    7. Support for Dual-Attached Hosts
      1. Host-Switch Interconnect Model
      2. VXLAN Model for Dual-Attached Hosts
      3. Switch Peering Options
      4. Handling Link Failures
      5. Avoiding Duplicate Multidestination Frames
    8. ARP/ND Suppression
    9. EVPN and Routing
      1. Centralized Versus Distributed Routing
      2. Symmetric Versus Asymmetric Routing
      3. Route Advertisements
      4. The Use of VRFs
    10. Deploying EVPN in Large Networks
    11. Summary
  18. 17. Deploying Network Virtualization
    1. The Configuration Scenarios
    2. Device-Local Configuration
    3. Single eBGP Session
    4. OSPF Underlay, iBGP Overlay
      1. allowas-in Versus Separate ASN
      2. PIM/MSDP Configuration
    5. EVPN on the Host
    6. Best Practices
    7. Summary
  19. 18. Validating Network Configuration
    1. Validating the Network State
    2. System Validation
    3. Cabling Validation
      1. Using Ansible to Validate Cabling
    4. Interface Configuration Validation
      1. Automating Interface Configuration Validation
    5. Routing Configuration Validation
      1. Validating an OSPF Configuration
      2. Validating a BGP Configuration
      3. Stripping the Private ASNs
    6. Validating Network Virtualization
    7. Application’s Network Validation
    8. Data-Plane Validation
    9. Summary
  20. 19. Coda
  21. Glossary
  22. Index