Integrating with Python
Python has slowly established ground as a de-facto tool for data science. It has a command-line interface and decent visualization via matplotlib and ggplot, which is based on R's ggplot2. Recently, Wes McKinney, the creator of Pandas, the time series data-analysis package, has joined Cloudera to pave way for Python in big data.
Setting up Python
Python is usually part of the default installation. Spark requires version 2.7.0+.
If you don't have Python on Mac OS, I recommend installing the Homebrew package manager from http://brew.sh:
[akozlov@Alexanders-MacBook-Pro spark(master)]$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" ==> This script will install: /usr/local/bin/brew /usr/local/Library/... ...
Get Scala:Applied Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.