Skip to Main Content
Apache Spark Deep Learning Cookbook
book

Apache Spark Deep Learning Cookbook

by Ahmed Sherif, Amrith Ravindra, Michal Malohlava, Adnan Masood
July 2018
Intermediate to advanced content levelIntermediate to advanced
474 pages
13h 37m
English
Packt Publishing
Content preview from Apache Spark Deep Learning Cookbook

How to do it...

Based on reviewing the text (which we did previously), the following are some operations that could be performed to clean and preprocess the text in the input file. We have presented a few options regarding text preprocessing. However, you may want to explore more cleaning operations as an exercise:

  • Replace dashes  with whitespaces so you can split words better
  • Split words based on whitespaces
  • Remove all punctuation from the input text in order to reduce the number of unique characters in the text that is fed into the model (for example, Why? becomes Why)
  • Remove all words that are not alphabetic to remove standalone punctuation tokens and emoticons
  • Convert all words from uppercase to lowercase in order to reduce the size ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Apache Spark for Data Science Cookbook

Apache Spark for Data Science Cookbook

Padma Priya Chitturi
Learning Apache Spark 2

Learning Apache Spark 2

Muhammad Asif Abbasi

Publisher Resources

ISBN: 9781788474221Supplemental Content