Chapter 9. Scaling: Hardware, Infrastructure, and Resource Management
Deploying and managing LLMs presents unique challenges and opportunities in the realm of infrastructure and resource management. LLMs, as you’ve seen throughout this book, are computationally intensive, requiring substantial hardware, storage, and network resources to operate efficiently. Whether you’re leveraging LLMs as a cloud-based service, deploying pretrained models in on-premises data centers, or training your own models from scratch, your infrastructure decisions will influence their performance, scalability, and cost-effectiveness.
Effective resource management for LLMs involves optimizing compute power, memory, and storage. In this chapter, we will explore the key components of infrastructure for LLMs, including hardware requirements and deployment strategies. We’ll also discuss best practices for optimizing resource use, managing costs, and maintaining reliability in production environments. This chapter will help you understand the trade-offs involved in managing resources for large-scale AI applications.
Choosing the Right Approach
Selecting the appropriate method for using LLMs depends on the requirements of the application that you want to use it for. For startups or small-scale applications, using models directly from the cloud may be the quickest and most cost-effective solution. For enterprises with specialized requirements or high workloads, deploying LLMs on cloud infrastructure can help ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access