We’ve covered several algorithms that learn and update state at each iteration, such as Label Propagation; however, up until this point, we’ve emphasized graph algorithms for general analytics. Because there’s increasing application of graphs in machine learning (ML), we’ll now look at how graph algorithms can be used to enhance ML workflows.
In this chapter, we focus on the most practical way to start improving ML predictions using graph algorithms: connected feature extraction and its use in predicting relationships. First, we’ll cover some basic ML concepts and the importance of contextual data for better predictions. Then there’s a quick survey of ways graph features are applied, including uses for spammer fraud, detection, and link prediction.
We’ll demonstrate how to create a machine learning pipeline and then train and evaluate a model for link prediction, integrating Neo4j and Spark in our workflow. Our example will be based on the Citation Network Dataset, which contains authors, papers, author relationships, and citation relationships. We’ll use several models to predict whether research authors are likely to collaborate in the future, and show how graph algorithms improve the results.
Machine learning is not artificial intelligence (AI), but a method for achieving AI. ML uses algorithms to train software through specific examples and progressive improvements ...