Using GraphX to analyze Twitter data
GraphX is Spark's approach to graphs and computation against graphs. In this recipe, we will see a preview of what is possible with the GraphX component in Spark.
How to do it...
Now that we have the Twitter data stored in the ElasticSearch index, we will perform the following tasks on this data using a graph:
- Convert the ElasticSearch data into a Spark Graph.
- Sample vertices, edges, and triplets in the graph.
- Find the top group of connected hashtags (connected component).
- List all the hashtags in that component.
- Converting the ElasticSearch data into a graph: This involves two steps:
- Converting ElasticSearch data into a DataFrame: This step, like we saw in an earlier recipe, is just a one-liner:
def convertElasticSearchDataToDataFrame(sqlContext: ...
- Converting ElasticSearch data into a DataFrame: This step, like we saw in an earlier recipe, is just a one-liner:
Get Scala: Guide for Data Science Professionals now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.