July 2017
Intermediate to advanced
796 pages
18h 55m
English
Approximate distinct count is much faster at approximately counting the distinct records rather than doing an exact count, which usually needs a lot of shuffles and other operations. While the approximate count is not 100% accurate, many use cases can perform equally well even without an exact count.
The approx_count_distinct API has several implementations, as follows. The exact API used depends on the specific use case.
def approx_count_distinct(columnName: String, rsd: Double): ColumnAggregate function: returns the approximate number of distinct items in a group.def approx_count_distinct(e: Column, rsd: Double): ColumnAggregate function: returns the approximate number of distinct items in a group.def approx_count_distinct(columnName: ...
Read now
Unlock full access