book

Mining the Social Web, 3rd Edition

by Matthew A. Russell, Mikhail Klassen

January 2019

Beginner to intermediate

426 pages

11h 4m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
A Note from Matthew RussellREADME.1stManaging Your ExpectationsPython-Centric TechnologyImprovements to the Third EditionThe Ethical Use of Data MiningConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments for the Third EditionAcknowledgments for the Second EditionAcknowledgments from the First Edition
I. A Guided Tour of the Social Web
Prelude
1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
OverviewWhy Is Twitter All the Rage?Exploring Twitter’s APIFundamental Twitter TerminologyCreating a Twitter API ConnectionExploring Trending TopicsSearching for TweetsAnalyzing the 140 (or More) CharactersExtracting Tweet EntitiesAnalyzing Tweets and Tweet Entities with Frequency AnalysisComputing the Lexical Diversity of TweetsExamining Patterns in RetweetsVisualizing Frequency Data with HistogramsClosing RemarksRecommended ExercisesOnline Resources
2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
OverviewExploring Facebook’s Graph APIUnderstanding the Graph APIUnderstanding the Open Graph ProtocolAnalyzing Social Graph ConnectionsAnalyzing Facebook PagesManipulating Data Using pandasClosing RemarksRecommended ExercisesOnline Resources
3. Mining Instagram: Computer Vision, Neural Networks, Object Recognition, and Face Detection
OverviewExploring the Instagram APIMaking Instagram API RequestsRetrieving Your Own Instagram FeedRetrieving Media by HashtagAnatomy of an Instagram PostCrash Course on Artificial Neural NetworksTraining a Neural Network to “Look” at PicturesRecognizing Handwritten DigitsObject Recognition Within Photos Using Pretrained Neural NetworksApplying Neural Networks to Instagram PostsTagging the Contents of an ImageDetecting Faces in ImagesClosing RemarksRecommended ExercisesOnline Resources
4. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More
OverviewExploring the LinkedIn APIMaking LinkedIn API RequestsDownloading LinkedIn Connections as a CSV FileCrash Course on Clustering DataNormalizing Data to Enable AnalysisMeasuring SimilarityClustering AlgorithmsClosing RemarksRecommended ExercisesOnline Resources
5. Mining Text Files: Computing Document Similarity, Extracting Collocations, and More
OverviewText FilesA Whiz-Bang Introduction to TF-IDFTerm FrequencyInverse Document FrequencyTF-IDFQuerying Human Language Data with TF-IDFIntroducing the Natural Language ToolkitApplying TF-IDF to Human LanguageFinding Similar DocumentsAnalyzing Bigrams in Human LanguageReflections on Analyzing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
6. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
OverviewScraping, Parsing, and Crawling the WebBreadth-First Search in Web CrawlingDiscovering Semantics by Decoding SyntaxNatural Language Processing Illustrated Step-by-StepSentence Detection in Human Language DataDocument SummarizationEntity-Centric Analysis: A Paradigm ShiftGisting Human Language DataQuality of Analytics for Processing Human Language DataClosing RemarksRecommended ExercisesOnline Resources
7. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More
OverviewObtaining and Processing a Mail CorpusA Primer on Unix MailboxesGetting the Enron DataConverting a Mail Corpus to a Unix MailboxConverting Unix Mailboxes to pandas DataFramesAnalyzing the Enron CorpusQuerying by Date/Time RangeAnalyzing Patterns in Sender/Recipient CommunicationsSearching Emails by KeywordsAnalyzing Your Own Mail DataAccessing Your Gmail with OAuthFetching and Parsing Email MessagesVisualizing Patterns in Email with ImmersionClosing RemarksRecommended ExercisesOnline Resources

8. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
OverviewExploring GitHub’s APICreating a GitHub API ConnectionMaking GitHub API RequestsModeling Data with Property GraphsAnalyzing GitHub Interest GraphsSeeding an Interest GraphComputing Graph Centrality MeasuresExtending the Interest Graph with “Follows” Edges for UsersUsing Nodes as Pivots for More Efficient QueriesVisualizing Interest GraphsClosing RemarksRecommended ExercisesOnline Resources
II. Twitter Cookbook
9. Twitter Cookbook
Accessing Twitter’s API for Development PurposesProblemSolutionDiscussionDoing the OAuth Dance to Access Twitter’s API for Production PurposesProblemSolutionDiscussionDiscovering the Trending TopicsProblemSolutionDiscussionSearching for TweetsProblemSolutionDiscussionConstructing Convenient Function CallsProblemSolutionDiscussionSaving and Restoring JSON Data with Text FilesProblemSolutionDiscussionSaving and Accessing JSON Data with MongoDBProblemSolutionDiscussionSampling the Twitter Firehose with the Streaming APIProblemSolutionDiscussionCollecting Time-Series DataProblemSolutionDiscussionExtracting Tweet EntitiesProblemSolutionDiscussionFinding the Most Popular Tweets in a Collection of TweetsProblemSolutionDiscussionFinding the Most Popular Tweet Entities in a Collection of TweetsProblemSolutionDiscussionTabulating Frequency AnalysisProblemSolutionDiscussionFinding Users Who Have Retweeted a StatusProblemSolutionDiscussionExtracting a Retweet’s AttributionProblemSolutionDiscussionMaking Robust Twitter RequestsProblemSolutionDiscussionResolving User Profile InformationProblemSolutionDiscussionExtracting Tweet Entities from Arbitrary TextProblemSolutionDiscussionGetting All Friends or Followers for a UserProblemSolutionDiscussionAnalyzing a User’s Friends and FollowersProblemSolutionDiscussionHarvesting a User’s TweetsProblemSolutionDiscussionCrawling a Friendship GraphProblemSolutionDiscussionAnalyzing Tweet ContentProblemSolutionDiscussionSummarizing Link TargetsProblemSolutionDiscussionAnalyzing a User’s Favorite TweetsProblemSolutionDiscussionClosing RemarksRecommended ExercisesOnline Resources
III. Appendixes
A. Information About This Book’s Virtual Machine Experience
B. OAuth Primer
OverviewOAuth 1.0aOAuth 2.0
C. Python and Jupyter Notebook Tips and Tricks
Index

Content preview from Mining the Social Web, 3rd Edition

Chapter 7. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More

Mail archives are arguably the ultimate kind of social web data and the basis of the earliest online social networks. Mail data is ubiquitous, and each message is inherently social, involving conversations and interactions among two or more people. Furthermore, each message consists of human language data that’s inherently expressive, and is laced with structured metadata fields that anchor the human language data in particular timespans and unambiguous identities. Mining mailboxes certainly provides an opportunity to synthesize all of the concepts you’ve learned in previous chapters and opens up incredible opportunities for discovering valuable insights.

Whether you are the CIO of a corporation and want to analyze corporate communications for trends and patterns, you have a keen interest in mining online mailing lists for insights, or you’d simply like to explore your own mailbox for patterns as part of quantifying yourself, the following discussion provides a primer to help you get started. This chapter introduces some fundamental tools and techniques for exploring mailboxes to answer questions such as:

Who sends mail to whom (and how much/often)?
Is there a particular time of the day (or day of the week) when the most mail chatter happens?
Which people send the most messages to one another?
What are the subjects of the liveliest discussion threads?

Although social media sites ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491973547Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mining the Social Web, 3rd Edition

by Matthew A. Russell, Mikhail Klassen

Chapter 7. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.