Chapter 16
Link Analysis
Who has friended whom on Facebook? Who calls whom on the telephone? Which physicians prescribe which drugs to which patients? Which pairs of cities generate the most passenger-miles? Which web pages have links that bridge language communities? Who reads which blogs on what topics? These relationships are all visible in data, and they all contain a wealth of information that most data mining techniques are not able to take direct advantage of. In the ever-more-connected world (where, it has been claimed, there are no more than six degrees of separation between any two people on the planet), understanding relationships and connections is critical. Link analysis is the data mining technique that addresses this need.
Link analysis is based on a branch of mathematics called graph theory, which represents relationships between different objects as edges in a graph. Link analysis is not a specific modeling technique, so it can be used for both directed and undirected data mining. It is often used for creating new derived variables for use by other modeling techniques. It can also be used for undirected data mining, by exploring the properties of the graphs themselves.
Graph theory is not applicable to all types of data nor can it solve all types of problems. Some areas where it has yielded good results are:
- Identifying authoritative sources of information on the Web by analyzing the links between its pages.
- Analyzing telephone call patterns to find influential ...