book

Mining the Social Web, 2nd Edition

Name: Mining the Social Web, 2nd Edition
Author: Matthew A. Russell
ISBN: 9781449367619

by Matthew A. Russell

October 2013

Beginner to intermediate

444 pages

12h 45m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Dedication
Preface
README.1stManaging Your ExpectationsPython-Centric TechnologyImprovements Specific to the Second EditionConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments for the Second EditionAcknowledgments from the First Edition
I. A Guided Tour of the Social Web
Prelude
1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
OverviewWhy Is Twitter All the Rage?Exploring Twitter’s APIFundamental Twitter TerminologyCreating a Twitter API ConnectionExploring Trending TopicsSearching for TweetsAnalyzing the 140 CharactersExtracting Tweet EntitiesAnalyzing Tweets and Tweet Entities with Frequency AnalysisComputing the Lexical Diversity of TweetsExamining Patterns in RetweetsVisualizing Frequency Data with HistogramsClosing RemarksRecommended ExercisesOnline Resources
2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
OverviewExploring Facebook’s Social Graph APIUnderstanding the Social Graph APIUnderstanding the Open Graph ProtocolAnalyzing Social Graph ConnectionsAnalyzing Facebook PagesAnalyzing this book’s Facebook pageAnalyzing Coke vs Pepsi Facebook pagesExamining FriendshipsAnalyzing things your friends “like”Analyzing mutual friendships with directed graphsVisualizing directed graphs of mutual friendshipsClosing RemarksRecommended ExercisesOnline Resources
3. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More
OverviewExploring the LinkedIn APIMaking LinkedIn API RequestsDownloading LinkedIn Connections as a CSV FileCrash Course on Clustering DataClustering Enhances User ExperiencesNormalizing Data to Enable AnalysisNormalizing and counting companiesNormalizing and counting job titlesNormalizing and counting locationsVisualizing locations with cartogramsMeasuring SimilarityClustering AlgorithmsGreedy clusteringRuntime analysisHierarchical clusteringk-means clusteringVisualizing geographic clusters with Google EarthClosing RemarksRecommended ExercisesOnline Resources
4. Mining Google+: Computing Document Similarity, Extracting Collocations, and More
OverviewExploring the Google+ APIMaking Google+ API RequestsA Whiz-Bang Introduction to TF-IDFTerm FrequencyInverse Document FrequencyTF-IDFQuerying Human Language Data with TF-IDFIntroducing the Natural Language ToolkitApplying TF-IDF to Human LanguageFinding Similar DocumentsThe theory behind vector space models and cosine similarityClustering posts with cosine similarityVisualizing document similarity with a matrix diagramAnalyzing Bigrams in Human LanguageContingency tables and scoring functionsReflections on Analyzing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
5. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
OverviewScraping, Parsing, and Crawling the WebBreadth-First Search in Web CrawlingDiscovering Semantics by Decoding SyntaxNatural Language Processing Illustrated Step-by-StepSentence Detection in Human Language DataDocument SummarizationAnalysis of Luhn’s summarization algorithmEntity-Centric Analysis: A Paradigm ShiftGisting Human Language DataQuality of Analytics for Processing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
6. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More
OverviewObtaining and Processing a Mail CorpusA Primer on Unix MailboxesGetting the Enron DataConverting a Mail Corpus to a Unix MailboxConverting Unix Mailboxes to JSONImporting a JSONified Mail Corpus into MongoDBThe MongoDB shellProgrammatically Accessing MongoDB with PythonAnalyzing the Enron CorpusQuerying by Date/Time RangeAnalyzing Patterns in Sender/Recipient CommunicationsWriting Advanced QueriesSearching Emails by KeywordsDiscovering and Visualizing Time-Series TrendsAnalyzing Your Own Mail DataAccessing Your Gmail with OAuthFetching and Parsing Email Messages with IMAPVisualizing Patterns in GMail with the “Graph Your Inbox” Chrome ExtensionClosing RemarksRecommended ExercisesOnline Resources

7. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
OverviewExploring GitHub’s APICreating a GitHub API ConnectionMaking GitHub API RequestsModeling Data with Property GraphsAnalyzing GitHub Interest GraphsSeeding an Interest GraphComputing Graph Centrality MeasuresExtending the Interest Graph with “Follows” Edges for UsersApplication of centrality measuresAdding more repositories to the interest graphComputational ConsiderationsUsing Nodes as Pivots for More Efficient QueriesVisualizing Interest GraphsClosing RemarksRecommended ExercisesOnline Resources
8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More
OverviewMicroformats: Easy-to-Implement MetadataGeocoordinates: A Common Thread for Just About AnythingUsing Recipe Data to Improve Online MatchmakingRetrieving recipe reviewsAccessing LinkedIn’s 200 Million Online RésumésFrom Semantic Markup to Semantic Web: A Brief InterludeThe Semantic Web: An Evolutionary RevolutionMan Cannot Live on Facts AloneOpen-world versus closed-world assumptionsInferencing About an Open WorldClosing RemarksRecommended ExercisesOnline Resources
II. Twitter Cookbook
9. Twitter Cookbook
Accessing Twitter’s API for Development PurposesProblemSolutionDiscussionDoing the OAuth Dance to Access Twitter’s API for Production PurposesProblemSolutionDiscussionDiscovering the Trending TopicsProblemSolutionDiscussionSearching for TweetsProblemSolutionDiscussionConstructing Convenient Function CallsProblemSolutionDiscussionSaving and Restoring JSON Data with Text FilesProblemSolutionDiscussionSaving and Accessing JSON Data with MongoDBProblemSolutionDiscussionSampling the Twitter Firehose with the Streaming APIProblemSolutionDiscussionCollecting Time-Series DataProblemSolutionDiscussionExtracting Tweet EntitiesProblemSolutionDiscussionFinding the Most Popular Tweets in a Collection of TweetsProblemSolutionDiscussionFinding the Most Popular Tweet Entities in a Collection of TweetsProblemSolutionDiscussionTabulating Frequency AnalysisProblemSolutionDiscussionFinding Users Who Have Retweeted a StatusProblemSolutionDiscussionExtracting a Retweet’s AttributionProblemSolutionDiscussionMaking Robust Twitter RequestsProblemSolutionDiscussionResolving User Profile InformationProblemSolutionDiscussionExtracting Tweet Entities from Arbitrary TextProblemSolutionDiscussionGetting All Friends or Followers for a UserProblemSolutionDiscussionAnalyzing a User’s Friends and FollowersProblemSolutionDiscussionHarvesting a User’s TweetsProblemSolutionDiscussionCrawling a Friendship GraphProblemSolutionDiscussionAnalyzing Tweet ContentProblemSolutionDiscussionSummarizing Link TargetsProblemSolutionDiscussionAnalyzing a User’s Favorite TweetsProblemSolutionDiscussionClosing RemarksRecommended ExercisesOnline Resources
III. Appendixes
A. Information About This Book’s Virtual Machine Experience
B. OAuth Primer
OverviewOAuth 1.0AOAuth 2.0
C. Python and IPython Notebook Tips & Tricks
Index
About the Author
Colophon
Copyright

Content preview from Mining the Social Web, 2nd Edition

Appendix B. OAuth Primer

Just as each chapter in this book has a corresponding IPython Notebook, each appendix also has a corresponding IPython Notebook. All notebooks, regardless of purpose, are maintained in the book’s GitHub source code repository. The particular appendix that you are reading here “in print” serves as a special cross-reference to the IPython Notebook that provides example code demonstrating interactive OAuth flows that involve explicit user authorization, which is needed if you implement a user-facing application.

The remainder of this appendix provides a terse discussion of OAuth as a basic orientation. The sample code for OAuth flows for popular websites such as Twitter, Facebook, and LinkedIn is in the corresponding IPython Notebook that is available with this book’s source code.

Note

Like the other appendixes, this appendix has a corresponding IPython Notebook entitled Appendix B: OAuth Primer that you can view online.

Overview

OAuth stands for “open authorization” and provides a means for users to authorize an application to access their account data through an API without the users needing to hand over sensitive credentials such as a username and password combination. Although OAuth is presented here in the context of the social web, keep in mind that it’s a specification that has wide applicability in any context in which users would like to authorize an application to take certain actions on their behalf. In general, users can control the level of access for ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449368180Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design