Chapter 7. Analyzing Co-occurrence Networks with GraphX

It’s a small world. It keeps recrossing itself.

David Mitchell

Data scientists come in all shapes and sizes and from a remarkably diverse set of academic backgrounds. Although many have some training in disciplines like computer science, mathematics, and physics, other successful data scientists have studied neuroscience, sociology, and political science. Although these fields study different things (e.g., brains, people, political institutions) and have not traditionally required students to learn how to program, they all share two important characteristics that have made them fertile training ground for data scientists.

First, all of these fields are interested in understanding relationships between entities, whether between neurons, individuals, or countries, and how these relationships affect the observed behavior of the entities. Second, the explosion of digital data over the past decade gave researchers access to vast quantities of information about these relationships, and required that they develop new skills in order to acquire and manage these data sets.

As these researchers began to collaborate with each other and with computer scientists, they also discovered that many of the techniques that they were using to analyze relationships could be applied to problems across domains, and the field of network science was born. Network science applies tools from graph theory, the mathematical discipline that ...

Get Advanced Analytics with Spark now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.