O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python Social Media Analytics

Book Description

Leverage the power of Python to collect, process, and mine deep insights from social media data

About This Book

  • Acquire data from various social media platforms such as Facebook, Twitter, YouTube, GitHub, and more
  • Analyze and extract actionable insights from your social data using various Python tools
  • A highly practical guide to conducting efficient social media analytics at scale

Who This Book Is For

If you are a programmer or a data analyst familiar with the Python programming language and want to perform analyses of your social data to acquire valuable business insights, this book is for you. The book does not assume any prior knowledge of any data analysis tool or process.

What You Will Learn

  • Understand the basics of social media mining
  • Use PyMongo to clean, store, and access data in MongoDB
  • Understand user reactions and emotion detection on Facebook
  • Perform Twitter sentiment analysis and entity recognition using Python
  • Analyze video and campaign performance on YouTube
  • Mine popular trends on GitHub and predict the next big technology
  • Extract conversational topics on public internet forums
  • Analyze user interests on Pinterest
  • Perform large-scale social media analytics on the cloud

In Detail

Social Media platforms such as Facebook, Twitter, Forums, Pinterest, and YouTube have become part of everyday life in a big way. However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them. This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business.

Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs. This book explains how to structure the clean data obtained and store in MongoDB using PyMongo. You will also perform web scraping and visualize data using Scrappy and Beautifulsoup.

Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.

Style and approach

This book follows a step-by-step approach to teach readers the concepts of social media analytics using the Python programming language. To explain various data analysis processes, real-world datasets are used wherever required.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Introduction to the Latest Social Media Landscape and Importance
    1. Introducing social graph
      1. Notion of influence
      2. Social impacts
      3. Platforms on platform
    2. Delving into social data
      1. Understanding semantics
      2. Defining the semantic web
      3. Exploring social data applications
    3. Understanding the process
    4. Working environment
      1. Defining Python
      2. Selecting an IDE
      3. Illustrating Git
    5. Getting the data
      1. Defining API
      2. Scraping and crawling
    6. Analyzing the data
      1. Brief introduction to machine learning
      2. Techniques for social media analysis
      3. Setting up data structure libraries
    7. Visualizing the data
    8. Getting started with the toolset
    9. Summary
  3. Harnessing Social Data - Connecting, Capturing, and Cleaning
    1. APIs in a nutshell
      1. Different types of API
        1. RESTful API
        2. Stream API
      2. Advantages of social media APIs
      3. Limitations of social media APIs
      4. Connecting principles of APIs
    2. Introduction to authentication techniques
      1. What is OAuth?
        1. User authentication
        2. Application authentication
      2. Why do we need to use OAuth?
      3. Connecting to social network platforms without OAuth
        1. OAuth1 and OAuth2
      4. Practical usage of OAuth
    3. Parsing API outputs
      1. Twitter
        1. Creating application
        2. Selecting the endpoint
        3. Using requests to connect
      2. Facebook
        1. Creating an app and getting an access token
        2. Selecting the endpoint
        3. Connect to the API
      3. GitHub
        1. Obtaining OAuth tokens programmatically
        2. Selecting the endpoint
        3. Connecting to the API
      4. YouTube
        1. Creating an application and obtaining an access token programmatically
        2. Selecting the endpoint
        3. Connecting to the API
      5. Pinterest
        1. Creating an application
        2. Selecting the endpoint
        3. Connecting to the API
    4. Basic cleaning techniques
      1. Data type and encoding
      2. Structure of data
      3. Pre-processing and text normalization
      4. Duplicate removal
    5. MongoDB to store and access social data
      1. Installing MongoDB
        1. Setting up the environment
        2. Starting MongoDB
    6. MongoDB using Python
    7. Summary
  4. Uncovering Brand Activity, Popularity, and Emotions on Facebook
    1. Facebook brand page
      1. The Facebook API
    2. Project planning
      1. Scope and process
      2. Data type
    3. Analysis
      1. Step 1 – data extraction
      2. Step 2 – data pull
      3. Step 3 – feature extraction
      4. Step 4 – content analysis
    4. Keywords
      1. Extracting verbatims for keywords
        1. User keywords
        2. Brand posts
        3. User hashtags
    5. Noun phrases
      1. Brand posts
      2. User comments
    6. Detecting trends in time series
      1. Maximum shares
        1. Brand posts
        2. User comments
      2. Maximum likes
        1. Brand posts
        2. Comments
    7. Uncovering emotions
      1. How to extract emotions?
        1. Introducing the Alchemy API
        2. Connecting to the Alchemy API
          1. Setting up an application
        3. Applying Alchemy API
    8. How can brands benefit from it?
    9. Summary
  5. Analyzing Twitter Using Sentiment Analysis and Entity Recognition
    1. Scope and process
    2. Getting the data
      1. Getting Twitter API keys
      2. Data extraction
        1. REST API Search endpoint
          1. Rate Limits
        2. Streaming API
      3. Data pull
      4. Data cleaning
    3. Sentiment analysis
    4. Customized sentiment analysis
      1. Labeling the data
        1. Creating the model
        2. Model performance evaluation and cross-validation
          1. Confusion matrix
        3. K-fold cross-validation
    5. Named entity recognition
      1. Installing NER
    6. Combining NER and sentiment analysis
    7. Summary
  6. Campaigns and Consumer Reaction Analytics on YouTube – Structured and Unstructured
    1. Scope and process
    2. Getting the data
      1. How to get a YouTube API key
    3. Data pull
    4. Data processing
    5. Data analysis
      1. Sentiment analysis in time
        1. Sentiment by weekday
      2. Comments in time
        1. Number of comments by weekday
    6. Summary
  7. The Next Great Technology – Trends Mining on GitHub
    1. Scope and process
    2. Getting the data
      1. Rate Limits
      2. Connection to GitHub
    3. Data pull
    4. Data processing
      1. Textual data
      2. Numerical data
    5. Data analysis
      1. Top technologies
      2. Programming languages
      3. Programming languages used in top technologies
      4. Top repositories by technology
      5. Comparison of technologies in terms of forks, open issues, size, and watchers count
        1. Forks versus open issues
        2. Forks versus size
        3. Forks versus watchers
        4. Open issues versus Size
        5. Open issues versus Watchers
        6. Size versus watchers
    6. Summary
  8. Scraping and Extracting Conversational Topics on Internet Forums
    1. Scope and process
    2. Getting the data
      1. Introduction to scraping
        1. Scrapy framework
        2. How it works
        3. Related tools
        4. Creating a project
        5. Creating spiders
        6. Teamspeed forum spider
    3. Data pull and pre-processing
      1. Data cleaning
      2. Part-of-speech extraction
    4. Data analysis
      1. Introduction to topic models
      2. Latent Dirichlet Allocation
      3. Applying LDA to forum conversations
      4. Topic interpretation
    5. Summary
  9. Demystifying Pinterest through Network Analysis of Users Interests
    1. Scope and process
    2. Getting the data
      1. Pinterest API
        1. Step 1 - creating an application and obtaining app ID and app secret
        2. Step 2 - getting your authorization code (access code)
        3. Step 3 - exchanging the access code for an access token
        4. Step 4 - testing the connection
        5. Getting Pinterest API data
      2. Scraping Pinterest search results
        1. Building a scraper with Selenium
        2. Scraping time constraints
    3. Data pull and pre-processing
      1. Pinterest API data
        1. Bigram extraction
        2. Building a graph
      2. Pinterest search results data
        1. Bigram extraction
        2. Building a graph
    4. Data analysis
      1. Understanding relationships between our own topics
      2. Finding influencers
        1. Conclusions
      3. Community structure
    5. Summary
  10. Social Data Analytics at Scale – Spark and Amazon Web Services
    1. Different scaling methods and platforms
      1. Parallel computing
      2. Distributed computing with Celery
        1. Celery multiple node deployment
      3. Distributed computing with Spark
        1. Text mining With Spark
    2. Topic models at scale
    3. Spark on the Cloud – Amazon Elastic MapReduce
    4. Summary