book

Building Intelligent Apps with Cognitive APIs

by Anand Raman, Chris Hoder

February 2020

Intermediate to advanced

94 pages

1h 59m

English

O'Reilly Media, Inc.

Read now

Unlock full access

1. What You’ll Learn in This Report
2. The Microsoft AI Platform
Machine Learning Services in Azure
3. Understanding Azure Cognitive Services
How to Call a Cognitive Services API New Breakthroughs as a Service
4. Vision
Computer VisionTagging Visual FeaturesObject DetectionDetecting BrandsCategorizing an ImageDescribing an ImageDetecting FacesDetecting Image TypesDetecting Domain-Specific ContentDetecting the Color SchemeGenerating a ThumbnailGetting the Area of InterestExtracting Text from ImagesCustom VisionHow to Train and Call a Custom Vision ModelFaceHow to Use the Face APIForm RecognizerInk RecognizerVideo Indexer
5. Speech
Speech to TextText to SpeechNeural and Custom VoicesTranslation and Unified SpeechSpeaker Verification and IdentificationSpeaker VerificationSpeaker Identification
6. Language
Text AnalyticsSentiment AnalysisKey Phrase ExtractionLanguage DetectionEntity RecognitionLanguage Understanding with LUISQnA MakerSpell CheckTranslator TextCustomizable TranslationImmersive Reader
7. Decision
Anomaly DetectorPersonalizerContent Moderator
8. Web Search
Bing Web SearchBing Custom SearchBing Visual SearchBing AutosuggestBing Video and News SearchBing Entity SearchBing Local Business SearchAzure Search and Cognitive Search
9. Paving the Road Ahead
Schneider ElectricSeeing AIAI Ethics and Microsoft’s PrinciplesFairnessReliability and SafetyPrivacy and SecurityInclusivityTransparencyAccountability
10. Where to Go Next

Content preview from Building Intelligent Apps with Cognitive APIs

Chapter 4. Vision

We live in a world of objects, but identifying them in pictures today can be challenging. Digital images are represented as arrays of pixels and color values with no data describing the objects that those pixels represent. However, advancements in machine learning on images are removing this barrier by providing powerful tools for extracting meaning and information from these pixels.

The Cognitive Services Vision APIs provide operations that take image data as input and return labeled content you can use in your app, whether it’s text from a menu, the expression on someone’s face, or a description of what’s going on in a video. These same services are used to power Bing’s image search, extract optical character recognition (OCR) text from images in OneNote, and index video in Azure Streams, making them tried and tested at scale.

The Vision category includes six services: Computer Vision, Custom Vision, Face, Form Recognizer, Ink Recognizer, and Video Indexer. We will provide a brief introduction to each.

Computer Vision

Computer Vision provides tools for analyzing images enabling a long list of insights including detection of objects, faces, color composition, tags, and landmarks. Behind the APIs are a set of deep neural networks trained to perform functions like image classification, scene and activity recognition, celebrity and landmark recognition, OCR, and handwriting recognition.

Many of the computer vision tasks are provided by the Analyze Image API, which ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492058632

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Building Intelligent Apps with Cognitive APIs

by Anand Raman, Chris Hoder

Chapter 4. Vision

Computer Vision

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.