July 2018
Intermediate to advanced
474 pages
13h 37m
English
The following section walks through the steps to train the TF-IDF model.
label = F.udf(lambda x: 1.0 if x == 'escalate' else 0.0, FloatType())df = df.withColumn('label', label('label'))
import pyspark.ml.feature as featTF_ = feat.HashingTF(inputCol="words without stop", outputCol="rawFeatures", numFeatures=100000)IDF_ = feat.IDF(inputCol="rawFeatures", outputCol="features")
pipelineTFIDF = Pipeline(stages=[TF_, ...
Read now
Unlock full access