O'Reilly logo

Mining the Social Web by Matthew A. Russell

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

To Read This Book?

If you have a basic programming background and are interested in insight surrounding the opportunities that arise from mining and analyzing data from the social web, you’ve come to the right place. We’ll begin getting our hands dirty after just a few more pages of frontmatter. I’ll be forthright, however, and say upfront that one of the chief complaints you’re likely to have about this book is that all of the chapters are far too short. Unfortunately, that’s always the case when trying to capture a space that’s evolving daily and is so rich and abundant with opportunities. That said, I’m a fan of the “80-20 rule”, and I sincerely believe that this book is a reasonable attempt at presenting the most interesting 20 percent of the space that you’d want to explore with 80 percent of your available time.

This book is short, but it does cover a lot of ground. Generally speaking, there’s a little more breadth than depth, although where the situation lends itself and the subject matter is complex enough to warrant a more detailed discussion, there are a few deep dives into interesting mining and analysis techniques. The book was written so that you could have the option of either reading it from cover to cover to get a broad primer on working with social web data, or pick and choose chapters that are of particular interest to you. In other words, each chapter is designed to be bite-sized and fairly standalone, but special care was taken to introduce material in a particular order so that the book as a whole is an enjoyable read.

Social networking websites such as Facebook, Twitter, and LinkedIn have transitioned from fad to mainstream to global phenomena over the last few years. In the first quarter of 2010, the popular social networking site Facebook surpassed Google for the most page visits,[1] confirming a definite shift in how people are spending their time online. Asserting that this event indicates that the Web has now become more a social milieu than a tool for research and information might be somewhat indefensible; however, this data point undeniably indicates that social networking websites are satisfying some very basic human desires on a massive scale in ways that search engines were never designed to fulfill. Social networks really are changing the way we live our lives on and off the Web,[2] and they are enabling technology to bring out the best (and sometimes the worst) in us. The explosion of social networks is just one of the ways that the gap between the real world and cyberspace is continuing to narrow.

Generally speaking, each chapter of this book interlaces slivers of the social web along with data mining, analysis, and visualization techniques to answer the following kinds of questions:

  • Who knows whom, and what friends do they have in common?

  • How frequently are certain people communicating with one another?

  • How symmetrical is the communication between people?

  • Who are the quietest/chattiest people in a network?

  • Who are the most influential/popular people in a network?

  • What are people chatting about (and is it interesting)?

The answers to these types of questions generally connect two or more people together and point back to a context indicating why the connection exists. The work involved in answering these kinds of questions is only the beginning of more complex analytic processes, but you have to start somewhere, and the low-hanging fruit is surprisingly easy to grasp, thanks to well-engineered social networking APIs and open source toolkits.

Loosely speaking, this book treats the social web[3] as a graph of people, activities, events, concepts, etc. Industry leaders such as Google and Facebook have begun to increasingly push graph-centric terminology rather than web-centric terminology as they simultaneously promote graph-based APIs. In fact, Tim Berners-Lee has suggested that perhaps he should have used the term Giant Global Graph (GGG) instead of World Wide Web (WWW), because the terms “web” and “graph” can be so freely interchanged in the context of defining a topology for the Internet. Whether the fullness of Tim Berners-Lee’s original vision will ever be realized remains to be seen, but the Web as we know it is getting richer and richer with social data all the time. When we look back years from now, it may well seem obvious that the second- and third-level effects created by an inherently social web were necessary enablers for the realization of a truly semantic web. The gap between the two seems to be closing.

[1] See the opening paragraph of Chapter 9.

[2] Mark Zuckerberg, the creator of Facebook, was named Person of the Year for 2010 by Time magazine (http://www.time.com/time/specials/packages/article/0,28804,2036683_2037183_2037185,00.html)

[3] See http://journal.planetwork.net/article.php?lab=reed0704 for another perspective on the social web that focuses on digital identities.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required