8 Gathering data at scale for real-world AI

This chapter covers

  • Selecting sources of data for AI applications
  • Building a serverless web crawler to find sources for large-scale data
  • Extracting data from websites using AWS Lambda
  • Understanding compliance, legal aspects, and politeness considerations for large-scale data gathering
  • Using CloudWatch Events as a bus for event-driven serverless systems
  • Performing service orchestration using AWS Step Functions

In chapter 7, we dealt with the application of natural language processing (NLP) techniques to product reviews. We showed how sentiment analysis and classification of text can be achieved with AWS Comprehend using streaming data in a serverless architecture. In this chapter, we are concerned ...

Get AI as a Service now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.