Skip to Main Content
Natural Language Processing with Spark NLP
book

Natural Language Processing with Spark NLP

by Alex Thomas
June 2020
Beginner to intermediate content levelBeginner to intermediate
364 pages
8h 58m
English
O'Reilly Media, Inc.
Content preview from Natural Language Processing with Spark NLP

Chapter 3. NLP on Apache Spark

It’s no longer news that there is a data deluge. Every day, people and devices are creating huge amounts of data. Text data is definitely one of the main kinds of data that humans produce. People write millions of comments, product reviews, Reddit messages, and tweets per day. This data is incredibly valuable—for both research and commerce. Because of the scale at which this data is created, our approach to working with it has changed.

Most of the original research in NLP was done on small data sets with hundreds or thousands of documents. You may think that it would be easier to build NLP applications now that we have so much more text data with which to build better models. However, these pieces of text have different pragmatics and are of different varieties, so leveraging them is more complicated from a data-science perspective. From the software engineering perspective, big data introduces many challenges. Structured data has predictable size and organization, which makes it easier to store and distribute efficiently. Text data is much less consistent. This makes parallelizing and distributing work more important and potentially more complex. Distributed computing frameworks like Spark help us manage these challenges and complexities.

In this chapter, we will discuss the Apache Spark and Spark NLP. First, we will cover some basic concepts that will help us understand distributed computing. Then, we will talk briefly about the history of distributed ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing

Taweh Beysolow II

Publisher Resources

ISBN: 9781492047759Errata PageSupplemental Content