Chapter 19. Productionizing NLP Applications
In this book we have talked about many different possible approaches and techniques that we can use to build our NLP application. We’ve talked about how to plan and develop an NLP application. Now, let’s talk about deploying NLP applications.
We will also talk about deploying models in production environments. Before we talk about how to deploy the models, we need to know the requirements on our product. If the model is being used in a batch process versus being used by a web service for individual evaluations, this changes how we want to deploy. We also need to know what kind of hardware will be required by the models. Some of the things we discuss here should be considered before modeling has begun—for example, the available hardware in production.
The easiest situation is where your application is running as a batch process on an internal cluster. This means that your performance requirements are based only on internal users (in your organization), and securing the data will also be simpler. But not everything is this simple.
Another important part of deploying a production-quality system is making sure that the application works fast enough for user needs without taking up too many resources. In this chapter we will discuss how to optimize the performance of your NLP system. First, we need to consider what we want to optimize.
When people talk about performance testing, they generally mean testing how long it takes for the program ...