Eliminate Duplicates
Many Wikipedia pages exist under two or more names. For example, there are pages about Complex Network and Complex Networks. The latter redirects to the former, but NetworkX does not know about the redirection.
Accurately merging all duplicate nodes involves natural language processing (NLP) tools that are outside of the scope of this book. It may suffice to join only those nodes that differ by the presence/absence of the letter s at the end or a hyphen in the middle.
Start removing self-loops (pages referring to themselves). The loops don’t change the network properties but affect the correctness of duplicate node elimination.
Now, you need a list of at least some duplicate nodes. You can build it by looking at each ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access