O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

FP-Growth Basic Sample

Let's start with a very simple dataset of random numbers:

val transactions = Seq(       "r z h k p",       "z y x w v u t s",       "s x o n r",       "x z y m t s q e",       "z",       "x z y r q t p")       .map(_.split(" "))

We will find out the most frequent items (character in this case). First, we will get the spark context as follows:

val sc = new SparkContext("local[2]", "Chapter 5 App")

Convert our data in an RDD:

val rdd = sc.parallelize(transactions, 2).cache()

Initialize the FPGrowth instance:

val fpg = new FPGrowth()
FP-Growth can be configured with the following parameters:
  • minSupport: the minimum support number for an itemset to be identified as frequent. For example, if an item appears in 3 out of 10 transactions, it has a support ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required