Skip to Content
Hands-On Transfer Learning with Python
book

Hands-On Transfer Learning with Python

by Dipanjan Sarkar, Raghav Bali, Tamoghna Ghosh
August 2018
Intermediate to advanced
438 pages
12h 3m
English
Packt Publishing
Content preview from Hands-On Transfer Learning with Python

Traditional text categorization

Building text categorization algorithms/models involves a set of preprocessing steps and proper representation of textual data as numerical vectors. Following are the general preprocessing steps:

  1. Sentence splitting: Split a document into a set of sentences.
  2. Tokenization: Split sentences into constituent words.
  3. Stemming or lemmatization: The word tokens are reduced to their base form. For example, words such as playing, played, and plays have one base: play. The base word output of stemming need not be a word in the dictionary. Whereas the root word from lemmatization, also known as the lemma, will always be present in the dictionary.
  4. Text cleanup: Case conversion, correcting spellings, and removing stopwords ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hands-On Transfer Learning with TensorFlow 2.0

Hands-On Transfer Learning with TensorFlow 2.0

Margaret Maynard-Reid

Publisher Resources

ISBN: 9781788831307Supplemental Content