Raw data
Raw data, which is used to train an ML model, can be text files, CSV files, images, videos, or custom formatted files. Raw data can even be a combination of these file types. Raw data can also be sequenced data, such as time series data—alternatively, it can even be vector representations for text, such as word embeddings. It's important to ensure that the raw input data is managed before it's fed into the model since it can affect the efficiency of the model's training at runtime.
In many cases, raw data can be stored in a database, such as MySQL, MS SQL, MongoDB, and so on. For the sake of this book, it's assumed that even tabular, SQL, or NoSQL data is raw data and that it needs to be split and converted into TFRecords for machine/deep ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access