Chapter 3: Data Labeling with Amazon SageMaker Ground Truth

One of the biggest barriers to ML projects in most companies is access to labeled training data. At one company we worked with, we were trying to identify consumer-impacting outages. The customer had a lot of data from each layer of their application stack, but they couldn't agree on how to define an outage. Is an outage when a load balancer is down? Probably not – we have redundancy in the infrastructure layer. Is an outage when a customer can't access the service for over 10 minutes? That's probably too granular; a single customer might have problems due to local network connectivity issues. So, what exactly do we mean by an outage? How can we automatically label our training data ...

Get Amazon SageMaker Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.