O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mining the Social Web, 3rd Edition

Book Description

Mine the rich data tucked away in popular social websites such as Twitter, Facebook, LinkedIn, and Instagram. With the third edition of this popular guide, data scientists, analysts, and programmers will learn how to glean insights from social media—including who’s connecting with whom, what they’re talking about, and where they’re located—using Python code examples, Jupyter notebooks, or Docker containers.

In part one, each standalone chapter focuses on one aspect of the social landscape, including each of the major social sites, as well as web pages, blogs and feeds, mailboxes, GitHub, and a newly added chapter covering Instagram. Part two provides a cookbook with two dozen bite-size recipes for solving particular issues with Twitter.

  • Get a straightforward synopsis of the social web landscape
  • Use Docker to easily run each chapter’s example code, packaged as a Jupyter notebook
  • Adapt and contribute to the code’s open source GitHub repository
  • Learn how to employ best-in-class Python 3 tools to slice and dice the data you collect
  • Apply advanced mining techniques such as TFIDF, cosine similarity, collocation analysis, clique detection, and image recognition
  • Build beautiful data visualizations with Python and JavaScript toolkits

Table of Contents

  1. Preface
    1. A Note from Matthew Russell
    2. README.1st
    3. Managing Your Expectations
    4. Python-Centric Technology
    5. Improvements to the Third Edition
    6. The Ethical Use of Data Mining
    7. Conventions Used in This Book
    8. Using Code Examples
    9. O’Reilly Safari
    10. How to Contact Us
    11. Acknowledgments for the Third Edition
    12. Acknowledgments for the Second Edition
    13. Acknowledgments from the First Edition
  2. I. A Guided Tour of the Social Web
  3. Prelude
  4. 1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
    1. Overview
    2. Why Is Twitter All the Rage?
    3. Exploring Twitter’s API
      1. Fundamental Twitter Terminology
      2. Creating a Twitter API Connection
      3. Exploring Trending Topics
      4. Searching for Tweets
    4. Analyzing the 140 (or More) Characters
      1. Extracting Tweet Entities
      2. Analyzing Tweets and Tweet Entities with Frequency Analysis
      3. Computing the Lexical Diversity of Tweets
      4. Examining Patterns in Retweets
      5. Visualizing Frequency Data with Histograms
    5. Closing Remarks
    6. Recommended Exercises
    7. Online Resources
  5. 2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
    1. Overview
    2. Exploring Facebook’s Graph API
      1. Understanding the Graph API
      2. Understanding the Open Graph Protocol
    3. Analyzing Social Graph Connections
      1. Analyzing Facebook Pages
      2. Manipulating Data Using pandas
    4. Closing Remarks
    5. Recommended Exercises
    6. Online Resources
  6. 3. Mining Instagram: Computer Vision, Neural Networks, Object Recognition, and Face Detection
    1. Overview
    2. Exploring the Instagram API
      1. Making Instagram API Requests
      2. Retrieving Your Own Instagram Feed
      3. Retrieving Media by Hashtag
    3. Anatomy of an Instagram Post
    4. Crash Course on Artificial Neural Networks
      1. Training a Neural Network to “Look” at Pictures
      2. Recognizing Handwritten Digits
      3. Object Recognition Within Photos Using Pretrained Neural Networks
    5. Applying Neural Networks to Instagram Posts
      1. Tagging the Contents of an Image
      2. Detecting Faces in Images
    6. Closing Remarks
    7. Recommended Exercises
    8. Online Resources
  7. 4. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More
    1. Overview
    2. Exploring the LinkedIn API
      1. Making LinkedIn API Requests
      2. Downloading LinkedIn Connections as a CSV File
    3. Crash Course on Clustering Data
      1. Normalizing Data to Enable Analysis
      2. Measuring Similarity
      3. Clustering Algorithms
    4. Closing Remarks
    5. Recommended Exercises
    6. Online Resources
  8. 5. Mining Text Files: Computing Document Similarity, Extracting Collocations, and More
    1. Overview
    2. Text Files
    3. A Whiz-Bang Introduction to TF-IDF
      1. Term Frequency
      2. Inverse Document Frequency
      3. TF-IDF
    4. Querying Human Language Data with TF-IDF
      1. Introducing the Natural Language Toolkit
      2. Applying TF-IDF to Human Language
      3. Finding Similar Documents
      4. Analyzing Bigrams in Human Language
      5. Reflections on Analyzing Human Language Data
    5. Closing Remarks
    6. Recommended Exercises
    7. Online Resources
  9. 6. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
    1. Overview
    2. Scraping, Parsing, and Crawling the Web
      1. Breadth-First Search in Web Crawling
    3. Discovering Semantics by Decoding Syntax
      1. Natural Language Processing Illustrated Step-by-Step
      2. Sentence Detection in Human Language Data
      3. Document Summarization
    4. Entity-Centric Analysis: A Paradigm Shift
      1. Gisting Human Language Data
    5. Quality of Analytics for Processing Human Language Data
    6. Closing Remarks
    7. Recommended Exercises
    8. Online Resources
  10. 7. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More
    1. Overview
    2. Obtaining and Processing a Mail Corpus
      1. A Primer on Unix Mailboxes
      2. Getting the Enron Data
      3. Converting a Mail Corpus to a Unix Mailbox
      4. Converting Unix Mailboxes to pandas DataFrames
    3. Analyzing the Enron Corpus
      1. Querying by Date/Time Range
      2. Analyzing Patterns in Sender/Recipient Communications
      3. Searching Emails by Keywords
    4. Analyzing Your Own Mail Data
      1. Accessing Your Gmail with OAuth
      2. Fetching and Parsing Email Messages
      3. Visualizing Patterns in Email with Immersion
    5. Closing Remarks
    6. Recommended Exercises
    7. Online Resources
  11. 8. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
    1. Overview
    2. Exploring GitHub’s API
      1. Creating a GitHub API Connection
      2. Making GitHub API Requests
    3. Modeling Data with Property Graphs
    4. Analyzing GitHub Interest Graphs
      1. Seeding an Interest Graph
      2. Computing Graph Centrality Measures
      3. Extending the Interest Graph with “Follows” Edges for Users
      4. Using Nodes as Pivots for More Efficient Queries
      5. Visualizing Interest Graphs
    5. Closing Remarks
    6. Recommended Exercises
    7. Online Resources
  12. II. Twitter Cookbook
  13. 9. Twitter Cookbook
    1. Accessing Twitter’s API for Development Purposes
      1. Problem
      2. Solution
      3. Discussion
    2. Doing the OAuth Dance to Access Twitter’s API for Production Purposes
      1. Problem
      2. Solution
      3. Discussion
    3. Discovering the Trending Topics
      1. Problem
      2. Solution
      3. Discussion
    4. Searching for Tweets
      1. Problem
      2. Solution
      3. Discussion
    5. Constructing Convenient Function Calls
      1. Problem
      2. Solution
      3. Discussion
    6. Saving and Restoring JSON Data with Text Files
      1. Problem
      2. Solution
      3. Discussion
    7. Saving and Accessing JSON Data with MongoDB
      1. Problem
      2. Solution
      3. Discussion
    8. Sampling the Twitter Firehose with the Streaming API
      1. Problem
      2. Solution
      3. Discussion
    9. Collecting Time-Series Data
      1. Problem
      2. Solution
      3. Discussion
    10. Extracting Tweet Entities
      1. Problem
      2. Solution
      3. Discussion
    11. Finding the Most Popular Tweets in a Collection of Tweets
      1. Problem
      2. Solution
      3. Discussion
    12. Finding the Most Popular Tweet Entities in a Collection of Tweets
      1. Problem
      2. Solution
      3. Discussion
    13. Tabulating Frequency Analysis
      1. Problem
      2. Solution
      3. Discussion
    14. Finding Users Who Have Retweeted a Status
      1. Problem
      2. Solution
      3. Discussion
    15. Extracting a Retweet’s Attribution
      1. Problem
      2. Solution
      3. Discussion
    16. Making Robust Twitter Requests
      1. Problem
      2. Solution
      3. Discussion
    17. Resolving User Profile Information
      1. Problem
      2. Solution
      3. Discussion
    18. Extracting Tweet Entities from Arbitrary Text
      1. Problem
      2. Solution
      3. Discussion
    19. Getting All Friends or Followers for a User
      1. Problem
      2. Solution
      3. Discussion
    20. Analyzing a User’s Friends and Followers
      1. Problem
      2. Solution
      3. Discussion
    21. Harvesting a User’s Tweets
      1. Problem
      2. Solution
      3. Discussion
    22. Crawling a Friendship Graph
      1. Problem
      2. Solution
      3. Discussion
    23. Analyzing Tweet Content
      1. Problem
      2. Solution
      3. Discussion
    24. Summarizing Link Targets
      1. Problem
      2. Solution
      3. Discussion
    25. Analyzing a User’s Favorite Tweets
      1. Problem
      2. Solution
      3. Discussion
    26. Closing Remarks
    27. Recommended Exercises
    28. Online Resources
  14. III. Appendixes
  15. A. Information About This Book’s Virtual Machine Experience
  16. B. OAuth Primer
    1. Overview
      1. OAuth 1.0a
      2. OAuth 2.0
  17. C. Python and Jupyter Notebook Tips and Tricks
  18. Index