Natural Language Processing with AWS AI Services

Book description

Work through interesting real-life business use cases to uncover valuable insights from unstructured text using AWS AI services

Key Features

  • Get to grips with AWS AI services for NLP and find out how to use them to gain strategic insights
  • Run Python code to use Amazon Textract and Amazon Comprehend to accelerate business outcomes
  • Understand how you can integrate human-in-the-loop for custom NLP use cases with Amazon A2I

Book Description

Natural language processing (NLP) uses machine learning to extract information from unstructured data. This book will help you to move quickly from business questions to high-performance models in production.

To start with, you'll understand the importance of NLP in today’s business applications and learn the features of Amazon Comprehend and Amazon Textract to build NLP models using Python and Jupyter Notebooks. The book then shows you how to integrate AI in applications for accelerating business outcomes with just a few lines of code. Throughout the book, you'll cover use cases such as smart text search, setting up compliance and controls when processing confidential documents, real-time text analytics, and much more to understand various NLP scenarios. You'll deploy and monitor scalable NLP models in production for real-time and batch requirements. As you advance, you'll explore strategies for including humans in the loop for different purposes in a document processing workflow. Moreover, you'll learn best practices for auto-scaling your NLP inference for enterprise traffic.

Whether you're new to ML or an experienced practitioner, by the end of this NLP book, you'll have the confidence to use AWS AI services to build powerful NLP applications.

What you will learn

  • Automate various NLP workflows on AWS to accelerate business outcomes
  • Use Amazon Textract for text, tables, and handwriting recognition from images and PDF files
  • Gain insights from unstructured text in the form of sentiment analysis, topic modeling, and more using Amazon Comprehend
  • Set up end-to-end document processing pipelines to understand the role of humans in the loop
  • Develop NLP-based intelligent search solutions with just a few lines of code
  • Create both real-time and batch document processing pipelines using Python

Who this book is for

If you're an NLP developer or data scientist looking to get started with AWS AI services to implement various NLP scenarios quickly, this book is for you. It will show you how easy it is to integrate AI in applications with just a few lines of code. A basic understanding of machine learning (ML) concepts is necessary to understand the concepts covered. Experience with Jupyter notebooks and Python will be helpful.

Table of contents

  1. Natural Language Processing with AWS AI Services
  2. Acknowledgments
  3. Foreword
  4. Contributors
  5. About the authors
  6. About the reviewers
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Code in Action
    7. Conventions used
    8. Get in touch
    9. Share Your Thoughts
  8. Section 1:Introduction to AWS AI NLP Services
  9. Chapter 1: NLP in the Business Context and Introduction to AWS AI Services
    1. Introducing NLP
    2. Overcoming the challenges in building NLP solutions
    3. Understanding why NLP is becoming mainstream
    4. Introducing the AWS ML stack
    5. Summary
    6. Further reading
  10. Chapter 2: Introducing Amazon Textract
    1. Technical requirements
    2. Setting up your AWS environment
      1. Signing up for an AWS account
      2. Creating an Amazon S3 bucket and a folder and uploading objects
      3. Creating an Amazon SageMaker Jupyter notebook instance
      4. Changing IAM permissions and trust relationships for the Amazon SageMaker notebook execution role
    3. Overcoming challenges with document processing
    4. Understanding how Amazon Textract can help
    5. Presenting Amazon Textract's product features
      1. Uploading sample document(s)
      2. Raw text or text extraction
      3. Form data and key/value pairs
      4. Table extraction
      5. Multiple language support
      6. Handwriting detection
      7. Human in the loop
    6. Using Amazon Textract with your applications
      1. Textract APIs
      2. Textract API demo with a Jupyter notebook
      3. Building applications using Amazon Textract APIs
    7. Summary
  11. Chapter 3: Introducing Amazon Comprehend
    1. Technical requirements
    2. Understanding Amazon Comprehend and Amazon Comprehend Medical
      1. Challenges associated with setting up ML preprocessing for NLP
      2. Exploring the benefits of Amazon Comprehend and Comprehend Medical
      3. Detecting insights in text using Comprehend and Comprehend Medical without preprocessing
      4. Using these services to gain insights from OCR documents from Amazon Textract
    3. Exploring Amazon Comprehend and Amazon Comprehend Medical product features
      1. Discovering Amazon Comprehend
      2. Deriving diagnoses from a doctor-patient transcript with Comprehend Medical
    4. Using Amazon Comprehend with your applications
      1. Architecting applications with Amazon API Gateway, AWS Lambda, and Comprehend
    5. Summary
  12. Section 2: Using NLP to Accelerate Business Outcomes
  13. Chapter 4: Automating Document Processing Workflows
    1. Technical requirements
    2. Automating document processing workflows
    3. Setting up compliance and control
      1. Setting up to solve the use case
      2. Additional IAM prerequisites
      3. Automating documents for control and compliance
    4. Processing real-time document workflows versus batch document workflows
    5. Summary
    6. Further reading
  14. Chapter 5: Creating NLP Search
    1. Technical requirements
    2. Creating NLP-powered smart search indexes
    3. Building a search solution for scanned images using Amazon Elasticsearch
      1. Prerequisites
      2. Uploading documents to Amazon S3
      3. Inspecting the AWS Lambda function
      4. Searching for and discovering data in the Kibana console
    4. Setting up an enterprise search solution using Amazon Kendra
      1. In this section, we will cover the steps to get started.
      2. Walking through the solution
      3. Searching in Amazon Kendra with enriched filters from Comprehend
    5. Summary
    6. Further reading
  15. Chapter 6: Using NLP to Improve Customer Service Efficiency
    1. Technical requirements
    2. Introducing the customer service use case
    3. Building an NLP solution to improve customer service
      1. Setting up to solve the use case
      2. Additional IAM prerequisites
      3. Preprocessing the customer service history data
    4. Summary
    5. Further reading
  16. Chapter 7: Understanding the Voice of Your Customer Analytics
    1. Technical requirements
    2. Challenges of setting up a text analytics solution
    3. Setting up a Yelp review text analytics workflow
      1. Setting up to solve the use case
      2. Walking through the solution using Jupyter Notebook
    4. Summary
    5. Further reading
  17. Chapter 8: Leveraging NLP to Monetize Your Media Content
    1. Technical requirements
    2. Introducing the content monetization use case
    3. Building the NLP solution for content monetization
      1. Setting up to solve the use case
      2. Additional IAM prerequisites
      3. Uploading the sample video and converting it for broadcast
      4. Running transcription, finding topics, and creating a VAST ad tag URL
      5. Inserting ads and testing our video
    4. Summary
    5. Further reading
  18. Chapter 9: Extracting Metadata from Financial Documents
    1. Technical requirements
    2. Extracting metadata from financial documents
    3. Setting up the use case
      1. Setting up the notebook code and S3 Bucket creation
      2. Analyzing the output of Comprehend Events
    4. Summary
    5. Further reading
  19. Chapter 10: Reducing Localization Costs with Machine Translation
    1. Technical requirements
    2. Introducing the localization use case
    3. Building a multi-language web page using machine translation
      1. Setting up to solve the use case
      2. Running the notebook
    4. Summary
    5. Further reading
  20. Chapter 11: Using Chatbots for Querying Documents
    1. Technical requirements
    2. Introducing the chatbot use case
    3. Creating an Amazon Kendra index with Amazon S3 as a data source
    4. Building an Amazon Lex chatbot
    5. Deploying the solution with AWS CloudFormation
    6. Summary
    7. Further reading
  21. Chapter 12: AI and NLP in Healthcare
    1. Technical requirements
    2. Introducing the automated claims processing use case
    3. Understanding how to extract and validate data from medical intake forms
    4. Understanding clinical data with Amazon Comprehend Medical
    5. Understanding invalid medical form processing with notifications
    6. Understanding how to create a serverless pipeline for medical claims
    7. Summary
    8. Further reading
  22. Section 3: Improving NLP Models in Production
  23. Chapter 13: Improving the Accuracy of Document Processing Workflows
    1. Technical requirements
    2. The need for setting up HITL processes with document processing
    3. Seeing the benefits of using Amazon A2I for HITL workflows
    4. Adding human reviews to your document processing pipelines
      1. Creating an Amazon S3 bucket
      2. Creating a private work team in the AWS Console
      3. Creating a human review workflow in the AWS Console
      4. Sending the document to Amazon Textract and Amazon A2I by calling the Amazon Textract API
    5. Summary
    6. Further reading
  24. Chapter 14: Auditing Named Entity Recognition Workflows
    1. Technical requirements
    2. Authenticating loan applications
    3. Building the loan authentication solution
      1. Setting up to solve the use case
      2. Additional IAM pre-requisites
      3. Training an Amazon Comprehend custom entity recognizer
      4. Creating a private team for the human loop
      5. Extracting sample document contents using Amazon Textract
      6. Detecting entities using the Amazon Comprehend custom entity recognizer
      7. Setting up an Amazon A2I human workflow loop
      8. Reviewing and modifying detected entities
      9. Retraining Comprehend custom entity recognizer
      10. Storing decisions for downstream processing
    4. Summary
    5. Further reading
  25. Chapter 15: Classifying Documents and Setting up Human in the Loop for Active Learning
    1. Technical requirements
    2. Using Comprehend custom classification with human in the loop for active learning
    3. Building the document classification workflow
      1. Setting up to solve the use case
      2. Creating an Amazon Comprehend classification training job
      3. Creating Amazon Comprehend real-time endpoints and testing a sample document
      4. Setting up active learning with a Comprehend real-time endpoint using human in the loop
    4. Summary
    5. Further reading
  26. Chapter 16: Improving the Accuracy of PDF Batch Processing
    1. Technical requirements
    2. Introducing the PDF batch processing use case
    3. Building the solution
      1. Setting up for the solution build
      2. Additional IAM prerequisites
      3. Creating a private team for the human loop
      4. Creating an Amazon S3 bucket
      5. Extracting the registration document's contents using Amazon Textract
      6. Setting up an Amazon A2I human workflow loop
      7. Storing results for downstream processing
    4. Summary
    5. Further reading
  27. Chapter 17: Visualizing Insights from Handwritten Content
    1. Technical requirements
    2. Extracting text from handwritten images
      1. Creating the SageMaker Jupyter notebook
      2. Additional IAM prerequisites
      3. Creating an Amazon S3 bucket
      4. Extracting text using Amazon Textract
    3. Visualizing insights using Amazon QuickSight
    4. Summary
  28. Chapter 18: Building Secure, Reliable, and Efficient NLP Solutions
    1. Technical requirements
    2. Defining best practices for NLP solutions
    3. Applying best practices for optimization
      1. Using an AWS S3 data lake
      2. Using AWS Glue for data processing and transformation tasks
      3. Using Amazon SageMaker Ground Truth for annotations
      4. Using Amazon Comprehend with PDF and Word formats directly
      5. Enforcing least privilege access
      6. Obfuscating sensitive data
      7. Protecting data at rest and in transit
      8. Using Amazon API Gateway for request throttling
      9. Setting up auto scaling for Amazon Comprehend endpoints
      10. Automating monitoring of custom training metrics
      11. Using Amazon A2I to review predictions
      12. Using Async APIs for loose coupling
      13. Using Amazon Textract Response Parser
      14. Persisting prediction results
      15. Using AWS Step Function for orchestration
      16. Using AWS CloudFormation templates
    4. Summary
    5. Further reading
    6. Why subscribe?
  29. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Natural Language Processing with AWS AI Services
  • Author(s): Mona M, Premkumar Rangarajan
  • Release date: November 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781801812535