Applied Network Analysis for Data Scientists: A Tutorial for Pythonistas
Topic: Data
Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.
This tutorial is for Pythonistas who want to understand relationship problems  as in, data problems that involve relationships between entities. Participants should already have a grasp of for loops and basic Python data structures (lists, tuples and dictionaries). By the end of the tutorial, participants will have learned how to use the NetworkX package in the Jupyter environment, and will become comfortable in visualizing large networks using Circos plots. Other plots will be introduced as well.
What you'll learnand how you can apply it
 Use NetworkX to model network data.
 Compute centrality metrics on a graph.
 Implement arbitrary pathfinding algorithms that operate on graphs.
 Create rational visualizations of graphstructured data.
This training course is for you because...
 This workshop is geared towards data scientists who have a desire to learn about network science and how they can be used to solve data science problems.
 The course material is geared towards intermediate learners. Course participants should be proficient in Python, but need not necessarily know graph theory beforehand.
 Learners will be gain knowledge of foundational concepts, with concrete, anchoring examples to aid in recall.
Prerequisites
 Participants in this course should already be familiar with Python programming idioms, including loops and list comprehensions, as well as basic Python data structures, including dictionaries and lists.
 Knowledge of NumPy and Pandas, particularly their respective APIs, will help in Part 2 of this course.
Course Setup:
All setup instructions are available on the GitHub repository:
 https://github.com/ericmjl/NetworkAnalysisMadeSimple.
 Participants should follow instructions up there before joining the class.
Recommended Preparation:
 Think Complexity (book)
Recommended Followup:
 Think Complexity (book)
About your instructor

Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData and ODSC, and has also codeveloped the Python Network Analysis curriculum on DataCamp. As an open source contributor, he has made contributions to PyMC3, matplotlib and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Introduction (10 min)
 Lecturestyle overview of graphs.
 Mini class discussion on graph theory.
Section 1: NetworkX Basics (30 min)
 Handson exercises interspersed with lectures.
 Basics of NetworkX API, syntax, plots.
Break (10 min)
Section 2: Hubs and Paths (50 min)
 Two metrics for identifying important nodes.
 Pathfinding algorithms.
Break (10 min)
Section 3: Structures (50 min)
 Algorithms for identifying cliques; connected component subgraphs.
 Metalevel topic: Composing NetworkX functions to perform graph queries.
Leftover Q&A (20 min)
Part 2: Additional Topics
Section 1: Graph I/O (30 minutes)
 Graph data formats on disk.
 Reading and writing pandas DataFrames.
Break (10 minutes)
Section 2: Bipartite graphs (50 minutes)
 Representing graphs with more than one node partitions: recommender systems.
 Computing projections of a graph onto one node set.
Break (10 minutes)
Section 3: Network Statistical Inference (30 minutes)
 Random graphs: a model for how the world works.
 Using statistical inference methods to determine whether a graph came from a particular class of random graphs.
Break (10 min)
Section 4: Matrix Operations (30 minutes)
 How to represent graphs as matrices
 Matrix operations on adjacency matrices: nonbipartite and bipartite graphs.