3

Processing Data in LLMOps Tools

Data preparation for textual data, whether it’s structured, semi-structured, or unstructured, involves a series of steps designed to understand the dataset’s characteristics, identify patterns, and prepare the data for further analysis or modeling. In the context of large language models (LLMs), this step is crucial in ensuring the data’s quality and relevance before training with it. This chapter outlines an end-to-end workflow for preparing textual data and explores the following topics:

  • Collecting data
  • Transforming data
  • Preparing data
  • Automating data

Collecting data

Data collection is a critical first step in preparing a dataset for training LLMs. Let’s go through an example of preparing a dataset for an ...

Get Essential Guide to LLMOps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.