July 2017
Intermediate to advanced
796 pages
18h 55m
English
The ntiles is a popular aggregation over a window and is commonly used to divide input dataset into n parts. For example, in predictive analytics, deciles (10 parts) are often used to first group the data and then divide it into 10 parts to get a fair distribution of data. This is a natural function of the window function approach, hence ntiles is a good example of how window functions can help.
For example, if we want to partition the statesPopulationDF by State (window specification was shown previously), order by population, and then divide into two portions, we can use ntile over the windowspec:
import org.apache.spark.sql.functions._scala> statesPopulationDF.select(col("State"), col("Year"), ntile(2).over(windowSpec), rank().over(windowSpec)).sort("State", ...Read now
Unlock full access