book

Mining the Social Web, 2nd Edition

Name: Mining the Social Web, 2nd Edition
Author: Matthew A. Russell
ISBN: 9781449367619

by Matthew A. Russell

October 2013

Beginner to intermediate

444 pages

12h 45m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Dedication
Preface
README.1stManaging Your ExpectationsPython-Centric TechnologyImprovements Specific to the Second EditionConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments for the Second EditionAcknowledgments from the First Edition
I. A Guided Tour of the Social Web
Prelude
1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
OverviewWhy Is Twitter All the Rage?Exploring Twitter’s APIFundamental Twitter TerminologyCreating a Twitter API ConnectionExploring Trending TopicsSearching for TweetsAnalyzing the 140 CharactersExtracting Tweet EntitiesAnalyzing Tweets and Tweet Entities with Frequency AnalysisComputing the Lexical Diversity of TweetsExamining Patterns in RetweetsVisualizing Frequency Data with HistogramsClosing RemarksRecommended ExercisesOnline Resources
2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
OverviewExploring Facebook’s Social Graph APIUnderstanding the Social Graph APIUnderstanding the Open Graph ProtocolAnalyzing Social Graph ConnectionsAnalyzing Facebook PagesAnalyzing this book’s Facebook pageAnalyzing Coke vs Pepsi Facebook pagesExamining FriendshipsAnalyzing things your friends “like”Analyzing mutual friendships with directed graphsVisualizing directed graphs of mutual friendshipsClosing RemarksRecommended ExercisesOnline Resources
3. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More
OverviewExploring the LinkedIn APIMaking LinkedIn API RequestsDownloading LinkedIn Connections as a CSV FileCrash Course on Clustering DataClustering Enhances User ExperiencesNormalizing Data to Enable AnalysisNormalizing and counting companiesNormalizing and counting job titlesNormalizing and counting locationsVisualizing locations with cartogramsMeasuring SimilarityClustering AlgorithmsGreedy clusteringRuntime analysisHierarchical clusteringk-means clusteringVisualizing geographic clusters with Google EarthClosing RemarksRecommended ExercisesOnline Resources
4. Mining Google+: Computing Document Similarity, Extracting Collocations, and More
OverviewExploring the Google+ APIMaking Google+ API RequestsA Whiz-Bang Introduction to TF-IDFTerm FrequencyInverse Document FrequencyTF-IDFQuerying Human Language Data with TF-IDFIntroducing the Natural Language ToolkitApplying TF-IDF to Human LanguageFinding Similar DocumentsThe theory behind vector space models and cosine similarityClustering posts with cosine similarityVisualizing document similarity with a matrix diagramAnalyzing Bigrams in Human LanguageContingency tables and scoring functionsReflections on Analyzing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
5. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
OverviewScraping, Parsing, and Crawling the WebBreadth-First Search in Web CrawlingDiscovering Semantics by Decoding SyntaxNatural Language Processing Illustrated Step-by-StepSentence Detection in Human Language DataDocument SummarizationAnalysis of Luhn’s summarization algorithmEntity-Centric Analysis: A Paradigm ShiftGisting Human Language DataQuality of Analytics for Processing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
6. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More
OverviewObtaining and Processing a Mail CorpusA Primer on Unix MailboxesGetting the Enron DataConverting a Mail Corpus to a Unix MailboxConverting Unix Mailboxes to JSONImporting a JSONified Mail Corpus into MongoDBThe MongoDB shellProgrammatically Accessing MongoDB with PythonAnalyzing the Enron CorpusQuerying by Date/Time RangeAnalyzing Patterns in Sender/Recipient CommunicationsWriting Advanced QueriesSearching Emails by KeywordsDiscovering and Visualizing Time-Series TrendsAnalyzing Your Own Mail DataAccessing Your Gmail with OAuthFetching and Parsing Email Messages with IMAPVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionClosing RemarksRecommended ExercisesOnline Resources

7. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
OverviewExploring GitHub’s APICreating a GitHub API ConnectionMaking GitHub API RequestsModeling Data with Property GraphsAnalyzing GitHub Interest GraphsSeeding an Interest GraphComputing Graph Centrality MeasuresExtending the Interest Graph with “Follows” Edges for UsersApplication of centrality measuresAdding more repositories to the interest graphComputational ConsiderationsUsing Nodes as Pivots for More Efficient QueriesVisualizing Interest GraphsClosing RemarksRecommended ExercisesOnline Resources
8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
OverviewMicroformats: Easy-to-Implement MetadataGeocoordinates: A Common Thread for Just About AnythingUsing Recipe Data to Improve Online MatchmakingRetrieving recipe reviewsAccessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief InterludeThe Semantic Web: An Evolutionary RevolutionMan Cannot Live on Facts AloneOpen-world versus closed-world assumptionsInferencing About an Open WorldClosing RemarksRecommended ExercisesOnline Resources
II. Twitter Cookbook
9. Twitter Cookbook
Accessing Twitter’s API for Development PurposesProblemSolutionDiscussionDoing the OAuth Dance to Access Twitter’s API for Production PurposesProblemSolutionDiscussionDiscovering the Trending TopicsProblemSolutionDiscussionSearching for TweetsProblemSolutionDiscussionConstructing Convenient Function CallsProblemSolutionDiscussionSaving and Restoring JSON Data with Text FilesProblemSolutionDiscussionSaving and Accessing JSON Data with MongoDBProblemSolutionDiscussionSampling the Twitter Firehose with the Streaming APIProblemSolutionDiscussionCollecting Time-Series DataProblemSolutionDiscussionExtracting Tweet EntitiesProblemSolutionDiscussionFinding the Most Popular Tweets in a Collection of TweetsProblemSolutionDiscussionFinding the Most Popular Tweet Entities in a Collection of TweetsProblemSolutionDiscussionTabulating Frequency AnalysisProblemSolutionDiscussionFinding Users Who Have Retweeted a StatusProblemSolutionDiscussionExtracting a Retweet’s AttributionProblemSolutionDiscussionMaking Robust Twitter RequestsProblemSolutionDiscussionResolving User Profile InformationProblemSolutionDiscussionExtracting Tweet Entities from Arbitrary TextProblemSolutionDiscussionGetting All Friends or Followers for a UserProblemSolutionDiscussionAnalyzing a User’s Friends and FollowersProblemSolutionDiscussionHarvesting a User’s TweetsProblemSolutionDiscussionCrawling a Friendship GraphProblemSolutionDiscussionAnalyzing Tweet ContentProblemSolutionDiscussionSummarizing Link TargetsProblemSolutionDiscussionAnalyzing a User’s Favorite TweetsProblemSolutionDiscussionClosing RemarksRecommended ExercisesOnline Resources
III. Appendixes
A. Information About This Book’s Virtual Machine Experience
B. OAuth Primer
OverviewOAuth 1.0AOAuth 2.0
C. Python and IPython Notebook Tips & Tricks
Index
About the Author
Colophon
Copyright

Content preview from Mining the Social Web, 2nd Edition

Chapter 2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More

In this chapter, we’ll tap into the Facebook platform through its (Social) Graph API and explore some of the vast possibilities. Facebook is arguably the heart of the social web and is somewhat of an all-in-one wonder, given that more than half of its 1 billion users^[2] are active each day updating statuses, posting photos, exchanging messages, chatting in real time, checking in to physical locales, playing games, shopping, and just about anything else you can imagine. From a social web mining standpoint, the wealth of data that Facebook stores about individuals, groups, and products is quite exciting, because Facebook’s clean API presents incredible opportunities to synthesize it into information (the world’s most precious commodity), and glean valuable insights. On the other hand, this great power commands great responsibility, and Facebook has instrumented the most sophisticated set of online privacy controls that the world has ever seen in order to help protect its users from exploit.

It’s worth noting that although Facebook is self-proclaimed as a social graph, it’s been steadily transforming into a valuable interest graph as well, because it maintains relationships between people and the things that they’re interested in through its Facebook pages and “Likes” feature. In this regard, you may increasingly hear it framed as a “social interest graph.” For the most part, you can make a case that interest ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449368180Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mining the Social Web, 2nd Edition

by Matthew A. Russell

Chapter 2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.