fastText Quick Start Guide

Book description

Perform efficient fast text representation and classification with Facebook's fastText library

Key Features

  • Introduction to Facebook's fastText library for NLP
  • Perform efficient word representations, sentence classification, vector representation
  • Build better, more scalable solutions for text representation and classification

Book Description

Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText.

This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification.

Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch.

Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects.

What you will learn

  • Create models using the default command line options in fastText
  • Understand the algorithms used in fastText to create word vectors
  • Combine command line text transformation capabilities and the fastText library to implement a training, validation, and prediction pipeline
  • Explore word representation and sentence classification using fastText
  • Use Gensim and spaCy to load the vectors, transform, lemmatize, and perform other NLP tasks efficiently
  • Develop a fastText NLP classifier using popular frameworks, such as Keras, Tensorflow, and PyTorch

Who this book is for

This book is for data analysts, data scientists, and machine learning developers who want to perform efficient word representation and sentence classification using Facebook's fastText library. Basic knowledge of Python programming is required.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. fastText Quick Start Guide
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  7. First Steps
  8. Introducing FastText
    1. Introducing fastText
    2. Installing fastText
      1. Prerequisites
        1. Windows
        2. Linux
      2. Installing dependencies on RHEL machines supporting the yum package manager
      3. Installing dependencies on Debian-based machines such as Ubuntu
      4. Installing dependencies on Arch Linux using pacman
    3. Installing dependencies on Mac systems
    4. Installing Python dependencies
      1. Installing fastText on Windows
      2. Installing fastText in Linux and macOS
    5. Using a Docker image for fastText
    6. Summary
  9. Creating Models Using FastText Command Line
    1. Text classification using fastText
      1. Text preprocessing
      2. English text and text using other Roman alphabets
      3. Downloading the data
      4. Preprocessing the Yelp data
      5. Text normalization
        1. Removing stop words
        2. Normalizing
        3. Shuffling all the data
      6. Dividing into training and validation
      7. Model building
        1. Model training
        2. Model testing and evaluation
          1. Precision and recall
          2. Confusion matrix
        3. Hyperparameters
          1. Epoch
          2. Learning rate
          3. N-grams
          4. Start with pretrained word vectors
          5. Finding the best fastText hyperparameters
      8. Model quantization
      9. Understanding the model
    2. FastText word vectors
      1. Creating word vectors
        1. Downloading from Wikipedia
        2. Text normalization
        3. Create word vectors
        4. Model evaluation
          1. Nearest neighbors
          2. Word analogies
        5. Other parameters when training
        6. Out of vocabulary words
      2. Facebook word vectors
      3. Using pretrained word vectors
        1. Machine translation
    3. Summary
  10. The FastText Model
  11. Word Representations in FastText
    1. Word-to-vector representations
      1. Types of word representations
      2. Getting vector representations from text
        1. One-hot encoding
        2. Bag of words
          1. TF-IDF
          2. N-grams
      3. Model architecture in fastText
        1. The unsupervised model
          1. Skipgram
          2. Subword information skipgram
          3. Implementing skipgram 
          4. CBOW
          5. CBOW implementation
          6. Comparison between skipgram and CBOW
      4. Loss functions and optimization
        1. Softmax
        2. Hierarchical softmax
        3. Negative sampling
        4. Subsampling of frequent words
      5. Context definitions
    2. Summary
  12. Sentence Classification in FastText
    1. Sentence classification
    2. fastText supervised learning
      1. Architecture
        1. Hierarchical softmax architecture
        2. The n-gram features and the hashing trick
          1. The FNV hash
        3. Word embeddings and their use in sentence classification
    3. fastText model quantization
      1. Compression techniques
        1. Quantization
        2. Vector quantization
          1. Finding the codebook for high-dimensional spaces
        3. Product quantization
        4. Additional steps
    4. Summary
  13. Using FastText in Your Own Models
  14. FastText in Python
    1. FastText official bindings
      1. PyBind
      2. Preprocessing data
      3. Unsupervised learning
        1. Training in fastText
        2. Evaluating the model
          1. Word vectors
          2. Nearest neighbor queries
          3. Word similarity
          4. Model performance
          5. Model visualization
      4. Supervised learning
        1. Data preprocessing and normalization
        2. Training the model
        3. Prediction
        4. Testing the model
        5. Confusion matrix
    2. Gensim
      1. Training a fastText model
        1. Hyperparameters
        2. Model saving and loading
        3. Word vectors
        4. Model Evaluation
          1. Word Mover's Distance
        5. Getting more out of the training process
      2. Machine translation using Gensim
    3. Summary
  15. Machine Learning and Deep Learning Models
    1. Scikit-learn and fastText
      1. Custom classifiers for fastText
      2. Bringing the whole thing together
    2. Embeddings
    3. Keras
      1. Embedding layer in Keras
      2. Convolutional neural networks
    4. TensorFlow
      1. Word embeddings in TensorFlow
      2. RNN architectures
    5. PyTorch
      1. The torchtext library
        1. Data classes in torchtext
        2. Using the iterators
      2. Bringing it all together
    6. Summary
  16. Deploying Models to Web and Mobile
    1. Deploying to the web
      1. Flask
        1. The fastText functions
        2. The flask endpoints
    2. Deploying to smaller devices
      1. Prerequisites – Completing the Google tutorial
      2. App considerations
      3. Adding the fastText model
      4. FastText in Java
      5. Adding the library dependencies to Android
      6. Using library dependencies in Android
      7. Finally the app
    3. Summary
  17. Notes for the Readers
    1. Windows and Linux
    2. Python 2 and Python 3
    3. The fastText command line
      1. The fastText supervised
      2. The fastText skipgram 
      3. The fastText cbow
    4. Gensim fastText parameters
  18. References
    1. Chapter 3
    2. Chapter 4
    3. Chapter 5
    4. Chapter 6
    5. Chapter 7
  19. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: fastText Quick Start Guide
  • Author(s): Joydeep Bhattacharjee
  • Release date: July 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789130997