14. Extending transformations with user-defined functions

This chapter covers

  • Extending Spark with user-defined functions
  • Registering a UDF
  • Calling a UDF with the dataframe API and Spark SQL
  • Using UDFs for data quality within Spark
  • Understanding the constraints linked to UDFs

Whether you have patiently read the first 13 chapters of this book, or hopped from chapter to chapter using a helicopter reading approach, you are definitely convinced that Spark is great, but . . . is Spark extensible? You may be asking, “How can I bring my existing libraries into the mix? Do I have to use solely the dataframe API and Spark SQL to implement all the transformations I want?”

From the title of this chapter, you can imagine that the answer to the first ...

Get Spark in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.