One way to plot this in Spark is to construct a histogram of the probabilities, using the midpoints of the binned range (centroids) for the x axis and the raw counts for each bin plotted on the y axis.
x=SparkR::histogram(preds_train,preds_train$prediction, nbins = 100) x$centroids=round(x$centroids,2) display(x)
After the display command has run, click on the Plot Icon (2nd Icon at the bottom left), and click on "Plot Options". The customize plot screen will then display:
- Set Display type to Grouped Bar Chart using a combination of the drop down selection and radio buttons.
- Drag counts from All Fields to Values.
- Drag Centroids to Keys.
Centroids represent the midpoints of the bars. In the code above, we rounded them ...