O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running custom functions

While Spark SQL doesn't support a range of functions as wide as ANSI SQL does, it has an easy and powerful mechanism for registering a normal Scala function and using it inside the SQL context.

Let's say we would like to find out how many profiles fall under each age group. We have a simple function called ageGroup. Given an age, it returns a string representing the age group:

def fnGroupAge(age: Int, bucket:Int=10) = { 
val buckets = Array("0-10", "11-20", "20-30", "31-40", "41-50", "51-60", "61-70", "71-80", "81-90", "91-100", ">100") 
val bucket = buckets((age-1)/10) 
bucket 
} 

Now, in order to register this function to be used inside Spark SQL, all that we need to do is give it a name and call the register method of the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required