Skip to Content
Advanced Algorithms and Data Structures
book

Advanced Algorithms and Data Structures

by Marcello La Rocca
July 2021
Intermediate to advanced
768 pages
25h 23m
English
Manning Publications
Content preview from Advanced Algorithms and Data Structures

13 Parallel clustering: MapReduce and canopy clustering

This chapter covers

  • Understanding parallel and distributed computing
  • Canopy clustering
  • Parallelizing k-means by leveraging canopy clustering
  • Using the MapReduce computational model
  • Using MapReduce to write a distributed version of k-means
  • Leveraging MapReduce canopy clustering
  • Working with MR-DBSCAN

In the previous chapter we introduced clustering and described three different approaches to data partitioning: k-means, DBSCAN, and OPTICS.

All these algorithms use a single-thread approach, where all the operations are executed sequentially in the same thread.1 This is the point where we should question our design: Is it really necessary to run these algorithms sequentially?

During the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

A Common-Sense Guide to Data Structures and Algorithms

A Common-Sense Guide to Data Structures and Algorithms

Jay Wengrow
Data Structures & Algorithms in Python

Data Structures & Algorithms in Python

John Canning, Alan Broder, Robert Lafore

Publisher Resources

ISBN: 9781617295485Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link