Chapter 3. Schema

Schema Deep Dive Introduction

What do you want your AI system to do? How will it accomplish this? What methods are you going to use?

In this chapter, I dive into some of the foundational concepts around the schema, a map between human meaning and machine learning.

The real world is messy. Commercial applications require a level of detail that’s hyper-domain-specific. There are many ways to structure all this complexity. In general, these structures are defined in the schema. Further, the schema provides “pivot points” to adapt and change subcomponents over time to better fit current needs.

The schema is important to get right because the rest of the system, including raw data, is defined in relation to the schema.

The schema is the paradigm for encoding all of your commercial knowledge. This can broadly be thought of as labels and attributes (what something is), spatial representations (where something is), their relations to each other, and their relations to external concepts (e.g., series, time). An effective schema relates well to your business needs and your raw data.

More generally, schema is the overall representation of labels, attributes, and spatial information, and their relations to each other. It’s how we think about and represent the meaning of what something is, where it is, and more. This builds on the high-level concepts of labels and attributes introduced in Chapter 1. After, I will map these training data concepts back to machine learning ...

Get Training Data for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.