O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

This was an interesting chapter. Finally, we got to work with Dataset APIs, using real data. We also got a glimpse of API organization. Datasets and their associated classes have a lot of interesting functions for you to explore. Python APIs are very much similar to Scala APIs and sometimes a little easier. The IPython notebook is available at https://github.com/xsankar/fdps-v3/blob/master/extras/003-DataFrame-For-DS.ipynb. Data wrangling with Python, and especially with Python notebooks, is the preferred way for data scientists.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required