July 2018
Intermediate to advanced
474 pages
13h 37m
English
The following section walks through the steps to profile the text data.
df.groupBy("label") \ .count() \ .orderBy("count", ascending = False) \ .show()
import pyspark.sql.functions as Fdf = df.withColumn('word_count', F.size(F.split(F.col('response_text'),' ')))
df.groupBy('label')\ .agg(F.avg('word_count').alias('avg_word_count'))\ .orderBy('avg_word_count', ascending = False) \ .show()
Read now
Unlock full access