O'Reilly logo

Big Data by Hai Jiang, Laurence T. Yang, Alfredo Cuzzocrea, Kuan-Ching Li

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1

Scalable Indexing for Big Data Processing

Hisham Mohamed and Stéphane Marchand-Maillet

Abstract

The K-nearest neighbor (K-NN) search problem is the way to find and predict the closest and most similar objects to a given query. It finds many applications for information retrieval and visualization, machine learning, and data mining. The context of Big Data imposes the finding of approximate solutions. Permutation-based indexing is one of the most recent techniques for approximate similarity search in large-scale domains. Data objects are represented by a list of references (pivots), which are ordered with respect to their distances from the object. In this chapter, we show different distributed algorithms for efficient indexing and ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required