CHAPTER 4Data Curation and Governance
When building an intelligent system, there are two main components. The first is the collection of algorithms that build the machine learning models underlying the technology. The second is the data that is fed into these algorithms. The data, in this case, is what provides the specific intelligence for the system.
Historically, the field of machine learning has focused its research on improving the algorithms to produce increasingly better models over time. Recently, however, the algorithms have improved to a point where they are no longer the bottleneck in the race for improved AI technology. These algorithms are now capable of consuming vast amounts of data and storing that intelligence in complex internal structures. Today, the race for improved AI systems has turned its focus to improvements in data, both in quality and volume.
Due to this shift in focus, when building your own AI system, you must first identify data sources and gather all the data necessary to build the system. Data that is used to build AI systems is typically referred to as ground truth—that is, the truth that underpins the knowledge in an AI system. Good ground truth typically comes from or is produced by organizational systems already in use. For instance, if a system is trying to predict what genre of music a user might like at a particular time of day, that system's ground truth can be pulled from the history of what users have selected to play throughout the ...
Get Artificial Intelligence for Business now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.