June 2017
Beginner to intermediate
576 pages
15h 22m
English
Bivariate cluster plots are often useful for seeing how the cluster assignments correlate with an x-y plot of two variables. Each cluster is plotted in a different color.
In Databricks, you can do this easily:
First, run the display command on some of the fitted data. In the code below, I have first extracted a 1,000-row sample. You want the sample to be small enough so that the points on the plot are not too dense.
tmp <- head(sample(fitted, F, .01),1000) display(tmp) #show cluster assignment by 2 variable matrix
Next, switch to the plot dialog box, Open the Customize Plot dialog. and perform the following graph setup: