Skip to Content
Natural Language and Search
book

Natural Language and Search

by Jon Handler, Milind Shyani, Karen Kilroy
April 2024
Intermediate to advanced
48 pages
1h 3m
English
O'Reilly Media, Inc.
Content preview from Natural Language and Search

Chapter 3. Vectors: Representing Semantic Information

Language comes so naturally to humans that its complexity is hard to understand. We go from concept and meaning to spoken or written word (and back), mostly unconsciously. If computers were humans, they could easily communicate in natural language. AI researchers have studied symbolic natural language processing (NLP) in computers for decades with mixed results. The advent of modern machine learning and the age of big data has revolutionized NLP and brought a paradigm shift in our approach, enabling us to code language as high-dimensional vectors.1 In this chapter, you will learn how ML systems train, employ, and create vectors to work with natural language.

Vector Basics

Computers and ML models only understand numbers. To work with the information contained in natural language, they need that information in number form. Vectors are that number form.

Vectors for semantic search (called embeddings) represent natural language as a set of values across many dimensions. When people train ML models for use in search engines, the goal is to produce a model that generates vectors that are close together for text that has similar meaning and far apart for text that has different meanings.

A vector (when centered at the origin) is a value for each of the axes in an n-dimensional space. Figure 3-1 shows the vector (4,6) in two dimensions—X and Y. You visualize this vector by drawing the line from the origin to the (X,Y) point. We ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

GPT-3

GPT-3

Sandra Kublik, Shubham Saboo
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 9781098156268