Skip to Content
Natural Language Processing with Spark NLP
book

Natural Language Processing with Spark NLP

by Alex Thomas
June 2020
Beginner to intermediate
364 pages
8h 58m
English
O'Reilly Media, Inc.
Content preview from Natural Language Processing with Spark NLP

Glossary

algorithmic complexity
The complexity of an algorithm is generally measured in the time it takes to run or how much space (memory or disk space) is needed to run it.
annotation
In an NLP context, an annotation is a marking on a segment of text or audio with some extra information. Generally, an annotation will require character indices for the start and end of the annotated segment, as well as an annotation type.
annotator
An annotator is a function that takes text and produces annotations. It is not uncommon for some annotators to have a dependency on another type of annotator.
Apache Hadoop
Hadoop is an open source implementation of the MapReduce paper. Initially, Hadoop required that the map, reduce, and any custom format readers be implemented and deployed to the cluster. Eventually, higher level abstractions were developed, like Apache Hive and Apache Pig.
Apache Parquet
Parquet is a data format originally created for Hadoop. It allows for efficient compression of columnar data. It is a popular format in the Spark ecosystem.
Apache Spark
Spark is a distributed computing framework with a high-level interface and in memory processing. Spark was developed in Scala, but there are now APIs for Java, Python, R, and SQL.
application
An application is a program with an end user. Many applications have a graphical user interface (GUI), though this is not necessary. In this book, we also consider programs that do batch data processing as “applications”.
array
An array is ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Bruno Goncalves

Publisher Resources

ISBN: 9781492047759Errata Page