CHAPTER 8Measuring AI Performance
“If you can't measure it (and measure it accurately), you can't improve it.”
Peter Drucker
Probably the single most important topic for you to take away from this book as a retail leader is this chapter. Many retailers struggle with performance measurement in the context of AI, and I always try to underscore how critical it is. Why?
You will have to trial many different AI vendors, and you will need to compare results from two or more AI solutions in a clear and objective way. You will have snake oil salesmen that will try to convince you of absurdities. If you apply the information in this chapter, you will not fall victim to these tactics. After running over 50 of these POCs over the last seven years, I have seen that almost no retailer knows how to properly assess the accuracy of an AI solution, mostly because they are not aware of the F1 score.
Scoring AI
Scoring in AI competitions, medical applications, and all other “classification” problems (e.g., do you have cancer or not) use F1 score as it is the correct way to compare two different “classifiers,” but most retailers use accuracy, recall, or their own made up metric instead.
Let's look at a common example in retail. Say you want to use AI to predict out‐of‐stocks in your store. You set up a bake‐off over a number of vendors. ...
Get AI for Retail now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.