Chapter 1

Scalable Indexing for Big Data Processing

Hisham Mohamed and Stéphane Marchand-Maillet

Abstract

The K-nearest neighbor (K-NN) search problem is the way to find and predict the closest and most similar objects to a given query. It finds many applications for information retrieval and visualization, machine learning, and data mining. The context of Big Data imposes the finding of approximate solutions. Permutation-based indexing is one of the most recent techniques for approximate similarity search in large-scale domains. Data objects are represented by a list of references (pivots), which are ordered with respect to their distances from the object. In this chapter, we show different distributed algorithms for efficient indexing and ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.