© Akshay Kulkarni and Adarsha Shivananda 2019
Akshay Kulkarni and Adarsha ShivanandaNatural Language Processing Recipeshttps://doi.org/10.1007/978-1-4842-4267-4_1

1. Extracting the Data

Akshay Kulkarni1  and Adarsha Shivananda1
(1)
Bangalore, Karnataka, India
 
In this chapter, we are going to cover various sources of text data and ways to extract it, which can act as information or insights for businesses.
  • Recipe 1. Text data collection using APIs

  • Recipe 2. Reading PDF file in Python

  • Recipe 3. Reading word document

  • Recipe 4. Reading JSON object

  • Recipe 5. Reading HTML page and HTML parsing

  • Recipe 6. Regular expressions

  • Recipe 7. String handling

  • Recipe 8. Web scraping

Introduction

Before getting into details of the book, let’s see the different possible data sources ...

Get Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.