© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. ShivanandaNatural Language Processing Recipeshttps://doi.org/10.1007/978-1-4842-7351-7_1

1. Extracting the Data

Akshay Kulkarni1   and Adarsha Shivananda1
(1)
Bangalore, Karnataka, India
 

This chapter covers various sources of text data and the ways to extract it. Textual data can act as information or insights for businesses. The following recipes are covered.

  • Recipe 1. Text data collection using APIs

  • Recipe 2. Reading a PDF file in Python

  • Recipe 3. Reading a Word document

  • Recipe 4. Reading a JSON object

  • Recipe 5. Reading an HTML page and HTML parsing

  • Recipe 6. Regular expressions

  • Recipe 7. String handling

  • Recipe 8. Web scraping

Introduction

Before ...

Get Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.