Key Questions for Model ServingWhat Will Be the Load to Our Model?What Are the Prediction Latency Needs of Our Model?Where Does the Model Need to Live?What Are the Hardware Needs for Our Model?How Will the Serving Model Be Stored, Loaded, Versioned, and Updated?What Will Our Feature Pipeline for Serving Look Like?Model Serving ArchitecturesOffline Serving (Batch Inference)Online Serving (Online Inference)Model as a ServiceServing at the EdgeChoosing an ArchitectureModel API DesignTestingServing for Accuracy or Resilience?ScalingAutoscalingCachingDisaster RecoveryEthics and Fairness ConsiderationsConclusion