How it works...

The index is a long data structure which provides a meaningful row index corresponding to each row of the IndexedRowMatrix. The horsepower underneath the implementation are the RDDs which offer all the advantages of a distributed resilient data structure in a parallel environment from the get go.

The primary advantage of IndexedRowMatrix is that the index can be carried along with the row (RDD) which is the data itself. The fact that we can define and carry along the index with the data (the actual row of matrix) is very useful when we have the join() operation that needs a key to select a specific row of data.

The following figure shows a pictorial view of the IndexedRowMatrix which should help clarify the subject:

The definition ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.