O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Microsoft Cognitive Services - Second Edition

Book Description

Learn to build interactive and efficient applications by leveraging 24 effective cognitive services APIs powered by Microsoft

About This Book

  • Explore the capabilities of 24 of the APIs released as part of the Cognitive Services platform
  • Build intelligent apps that combine the power of computer vision, speech recognition, and language processing
  • Give your apps human-like cognitive intelligence with this hands-on guide

Who This Book Is For

.NET developers who want to add AI capabilities to their applications will find this book useful. No knowledge of machine learning or AI is necessary to work through this book.

What You Will Learn

  • Identify a person through visual inspection and audio
  • Reduce user effort by utilizing AI-like capabilities
  • Understand how to analyze images and text in different ways
  • Find out how to analyze images using Vision APIs
  • Add video analysis to applications using Vision APIs
  • Utilize Search to find anything you want
  • Analyze text to extract information and explore text structure

In Detail

Microsoft has revamped its Project Oxford to launch the all new Cognitive Services platform-a set of 30 APIs to add speech, vision, language, and knowledge capabilities to apps.

This book will introduce you to 24 of the APIs released as part of Cognitive Services platform and show you how to leverage their capabilities. More importantly, you'll see how the power of these APIs can be combined to build real-world apps that have cognitive capabilities. The book is split into three sections: computer vision, speech recognition and language processing, and knowledge and search.

You will be taken through the vision APIs at first as this is very visual, and not too complex. The next part revolves around speech and language, which are somewhat connected. The last part is about adding real-world intelligence to apps by connecting them to Knowledge and Search APIs.

By the end of this book, you will be in a position to understand what Microsoft Cognitive Service can offer and how to use the different APIs.

Style and approach

This book takes you through essential API capabilities and shows how to utilize them to suit the needs of your application.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Getting Started with Microsoft Cognitive Services
    1. Cognitive Services in action for fun and life-changing purposes
    2. Setting up boilerplate code
    3. Detecting faces with the Face API
    4. An overview of what we are dealing with
      1. Vision
        1. Computer Vision
        2. Emotion
        3. Face
        4. Video
        5. Video Indexer
        6. Content Moderator
        7. Custom Vision Service
      2. Speech
        1. Bing Speech
        2. Speaker Recognition
        3. Custom Recognition
        4. Translator Speech API
      3. Language
        1. Bing Spell Check
        2. Language Understanding Intelligent Service (LUIS)
        3. Linguistic Analysis
        4. Text Analysis
        5. Web Language Model
        6. Translator Text API
      4. Knowledge
        1. Academic
        2. Entity Linking
        3. Knowledge Exploration
        4. Recommendations
        5. QnA Maker
        6. Custom Decision Service
      5. Search
        1. Bing Web Search
        2. Bing Image Search
        3. Bing Video Search
        4. Bing News Search
        5. Bing Autosuggest
        6. Bing Entity Search
    5. Getting feedback on detected faces
    6. Summary
  3. Analyzing Images to Recognize a Face
    1. Learning what an image is about using the Computer Vision API
      1. Setting up a chapter example project
      2. Generic image analysis
      3. Recognizing celebrities using domain models
      4. Utilizing Optical Character Recognition
      5. Generating image thumbnails
    2. Diving deep into the Face API
      1. Retrieving more information from the detected faces
      2. Deciding whether two faces belong to the same person
      3. Finding similar faces
      4. Grouping similar faces
    3. Adding identification to our smart-house application
      1. Creating our smart-house application
      2. Adding people to be identified
      3. Identifying a person
    4. Automatically moderating user content
      1. The content to moderate
        1. Image moderation
        2. Text moderation
      2. Moderation tools
        1. Using the review tool
        2. Other tools
    5. Summary
  4. Analyzing Videos
    1. Knowing your mood using the Emotion API
      1. Getting images from a web camera
      2. Letting the smart-house know your mood
    2. Diving into the Video API
      1. Video operations as common code
      2. Getting operation results
      3. Wiring up the execution in the ViewModel
      4. Detecting and tracking faces in videos
      5. Detecting motion
      6. Stabilizing shaky videos
      7. Generating video thumbnails
    3. Analyzing emotions in videos
    4. Unlocking video insights using Video Indexer
      1. General overview
        1. Typical scenarios
        2. Key concepts
          1. Breakdowns
          2. Summarized insights
          3. Keywords
          4. Sentiments
          5. Blocks
      2. How to use Video Indexer
        1. Through a web portal
        2. Video Indexer API
    5. Summary
  5. Letting Applications Understand Commands
    1. Creating language-understanding models
      1. Registering an account and getting a license key
      2. Creating an application
      3. Recognizing key data using entities
      4. Understanding what the user wants using intents
      5. Simplifying development using prebuilt models
      6. Prebuilt domains
    2. Training a model
      1. Training and publishing the model
      2. Connecting to the smart-house application
      3. Model improvement through active usage
        1. Visualizing performance
        2. Resolving performance problems
          1. Adding model features
          2. Adding labeled utterances
          3. Looking for incorrect utterance labels
          4. Changing the schema
        3. Active learning
    3. Summary
  6. Speaking with Your Application
    1. Converting text to audio and vice versa
      1. Speaking to the application
      2. Letting the application speak back
        1. Audio output format
        2. Error codes
        3. Supported languages
      3. Utilizing LUIS based on spoken commands
    2. Knowing who is speaking
      1. Adding speaker profiles
      2. Enrolling a profile
      3. Identifying the speaker
    3. Verifying a person through speech
    4. Customizing speech recognition
      1. Creating a custom acoustic model
      2. Creating a custom language model
      3. Deploying the application
    5. Summary
  7. Understanding Text
    1. Setting up a common core
      1. New project
      2. Web requests
      3. Data contracts
    2. Correcting spelling errors
    3. Natural Language Processing using the Web Language Model
      1. Breaking a word into several words
      2. Generating the next word in a sequence of words
      3. Learning if a word is likely to follow a sequence of words
      4. Learning if certain words are likely to appear together
    4. Extracting information through textual analysis
      1. Detecting language
      2. Extracting key phrases from text
      3. Learning if a text is positive or negative
    5. Exploring text using linguistic analysis
      1. Introduction to linguistic analysis
      2. Analyzing text from a linguistic viewpoint
    6. Summary
  8. Extending Knowledge Based on Context
    1. Linking entities based on context
    2. Providing personalized recommendations
      1. Creating a model
      2. Importing catalog data
      3. Importing usage data
      4. Building a model
      5. Consuming recommendations
        1. Recommending items based on prior activities
    3. Summary
  9. Querying Structured Data in a Natural Way
    1. Tapping into academic content using the Academic API
      1. Setting up an example project
      2. Interpreting natural language queries
      3. Finding academic entities from query expressions
      4. Calculating the distribution of attributes from academic entities
      5. Entity attributes
    2. Creating the backend using the Knowledge Exploration Service
      1. Defining attributes
      2. Adding data
      3. Building the index
      4. Understanding natural language
      5. Local hosting and testing
      6. Going for scale
        1. Hooking into Microsoft Azure
        2. Deploying the service
    3. Answering FAQs using QnA Maker
      1. Creating a knowledge base from frequently asked questions
      2. Training the model
      3. Publishing the model
      4. Improving the model
    4. Summary
  10. Adding Specialized Searches
    1. Searching the web from the smart-house application
      1. Preparing the application for web searches
      2. Searching the web
    2. Getting the news
      1. News from queries
      2. News from categories
      3. Trending news
    3. Searching for images and videos
      1. Using a common user interface
      2. Searching for images
      3. Searching for videos
    4. Helping the user with auto suggestions
      1. Adding Autosuggest to the user interface
      2. Suggesting queries
    5. Search commonalities
      1. Languages
      2. Pagination
      3. Filters
        1. Safe search
        2. Freshness
      4. Errors
    6. Summary
  11. Connecting the Pieces
    1. Connecting the pieces
      1. Creating an intent
      2. Updating the code
        1. Executing actions from intents
        2. Searching news on command
        3. Describing news images
    2. Real-life applications using Microsoft Cognitive Services
      1. Uber
      2. DutchCrafters
      3. CelebsLike.me
      4. Pivothead - wearable glasses
      5. Zero Keyboard
      6. The common theme
    3. Where to go from here
    4. Summary
  12. LUIS Entities and Additional Information on Linguistic Analysis
    1. LUIS pre-built entities
    2. Part-of-speech tags
    3. Phrase types
  13. License Information
    1. Video Frame Analyzer
    2. OpenCvSharp3
    3. Newtonsoft.Json
    4. NAudio
      1. Definitions
      2. Grant of Rights
        1. Conditions and Limitations