Chapter 1. Getting Started
Introduction
This book is about using Spark NLP to build natural language processing (NLP) applications. Spark NLP is an NLP library built on top of Apache Spark. In this book I’ll cover how to use Spark NLP, as well as fundamental natural language processing topics. Hopefully, at the end of this book you’ll have a new software tool for working with natural language and Spark NLP, as well as a suite of techniques and some understanding of why these techniques work.
Let’s begin by talking about the structure of this book. In the first part, we’ll go over the technologies and techniques we’ll be using with Spark NLP throughout this book. After that we’ll talk about the building blocks of NLP. Finally, we’ll talk about NLP applications and systems.
When working on an application that requires NLP, there are three perspectives you should keep in mind: the software developer’s perspective, the linguist’s perspective, and the data scientist’s perspective. The software developer’s perspective focuses on what your application needs to do; this grounds the work in terms of the product you want to create. The linguist’s perspective focuses on what it is in the data that you want to extract. The data scientist’s perspective focuses on how you can extract the information you need from your data.
Following is a more detailed overview of the book.
- Part I, Basics
- Chapter 1 covers setting up your environment so you can follow along with the examples and exercises ...