O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Deep Learning with Hadoop

Book Description

Build, implement and scale distributed deep learning models for large-scale datasets

About This Book

  • Get to grips with the deep learning concepts and set up Hadoop to put them to use
  • Implement and parallelize deep learning models on Hadoop's YARN framework
  • A comprehensive tutorial to distributed deep learning with Hadoop

Who This Book Is For

If you are a data scientist who wants to learn how to perform deep learning on Hadoop, this is the book for you. Knowledge of the basic machine learning concepts and some understanding of Hadoop is required to make the best use of this book.

What You Will Learn

  • Explore Deep Learning and various models associated with it
  • Understand the challenges of implementing distributed deep learning with Hadoop and how to overcome it
  • Implement Convolutional Neural Network (CNN) with deeplearning4j
  • Delve into the implementation of Restricted Boltzmann Machines (RBM)
  • Understand the mathematical explanation for implementing Recurrent Neural Networks (RNN)
  • Get hands on practice of deep learning and their implementation with Hadoop.

In Detail

This book will teach you how to deploy large-scale dataset in deep neural networks with Hadoop for optimal performance.

Starting with understanding what deep learning is, and what the various models associated with deep neural networks are, this book will then show you how to set up the Hadoop environment for deep learning. In this book, you will also learn how to overcome the challenges that you face while implementing distributed deep learning with large-scale unstructured datasets. The book will also show you how you can implement and parallelize the widely used deep learning models such as Deep Belief Networks, Convolutional Neural Networks, Recurrent Neural Networks, Restricted Boltzmann Machines and autoencoder using the popular deep learning library deeplearning4j.

Get in-depth mathematical explanations and visual representations to help you understand the design and implementations of Recurrent Neural network and Denoising AutoEncoders with deeplearning4j. To give you a more practical perspective, the book will also teach you the implementation of large-scale video processing, image processing and natural language processing on Hadoop.

By the end of this book, you will know how to deploy various deep neural networks in distributed systems using Hadoop.

Style and approach

This book takes a comprehensive, step-by-step approach to implement efficient deep learning models on Hadoop. It starts from the basics and builds the readers' knowledge as they strengthen their understanding of the concepts. Practical examples are included in every step of the way to supplement the theory.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Deep Learning with Hadoop
    1. Deep Learning with Hadoop
    2. Credits
    3. About the Author
    4. About the Reviewers
    5. www.PacktPub.com
      1. Why subscribe?
    6. Customer Feedback
    7. Dedication
    8. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    9. 1. Introduction to Deep Learning
      1. Getting started with deep learning
        1. Deep feed-forward networks
        2. Various learning algorithms
          1. Unsupervised learning
          2. Supervised learning
          3. Semi-supervised learning
      2. Deep learning terminologies
      3. Deep learning: A revolution in Artificial Intelligence
        1. Motivations for deep learning
          1. The curse of dimensionality
          2. The vanishing gradient problem
          3. Distributed representation
      4. Classification of deep learning networks
        1. Deep generative or unsupervised models
        2. Deep discriminate models
      5. Summary
    10. 2. Distributed Deep Learning for Large-Scale Data
      1. Deep learning for massive amounts of data
      2. Challenges of deep learning for big data
        1. Challenges of deep learning due to massive volumes of data (first V)
        2. Challenges of deep learning from a high variety of data (second V)
        3. Challenges of deep learning from a high velocity of data (third V)
        4. Challenges of deep learning to maintain the veracity of data (fourth V)
      3. Distributed deep learning and Hadoop
        1. Map-Reduce
        2. Iterative Map-Reduce
        3. Yet Another Resource Negotiator (YARN)
        4. Important characteristics for distributed deep learning design
      4. Deeplearning4j - an open source distributed framework for deep learning
        1. Major features of Deeplearning4j
        2. Summary of functionalities of Deeplearning4j
      5. Setting up Deeplearning4j on Hadoop YARN
        1. Getting familiar with Deeplearning4j
        2. Integration of Hadoop YARN and Spark for distributed deep learning
        3. Rules to configure memory allocation for Spark on Hadoop YARN
      6. Summary
    11. 3. Convolutional Neural Network
      1. Understanding convolution
      2. Background of a CNN
        1. Architecture overview
      3. Basic layers of CNN
        1. Importance of depth in a CNN
        2. Convolutional layer
          1. Sparse connectivity
            1. Improved time complexity
          2. Parameter sharing
            1. Improved space complexity
          3. Equivariant representations
        3. Choosing the hyperparameters for Convolutional layers
          1. Depth
          2. Stride
          3. Zero-padding
          4. Mathematical formulation of hyperparameters
            1. Effect of zero-padding
        4. ReLU (Rectified Linear Units) layers
          1. Advantages of ReLU over the sigmoid function
        5. Pooling layer
          1. Where is it useful, and where is it not?
        6. Fully connected layer
      4. Distributed deep CNN
        1. Most popular aggressive deep neural networks and their configurations
        2. Training time - major challenges associated with deep neural networks
        3. Hadoop for deep CNNs
      5. Convolutional layer using Deeplearning4j
        1. Loading data
        2. Model configuration
        3. Training and evaluation
      6. Summary
    12. 4. Recurrent Neural Network
      1. What makes recurrent networks distinctive from others?
      2. Recurrent neural networks(RNNs)
        1. Unfolding recurrent computations
          1. Advantages of a model unfolded in time
        2. Memory of RNNs
        3. Architecture
      3. Backpropagation through time (BPTT)
        1. Error computation
      4. Long short-term memory
        1. Problem with deep backpropagation with time
        2. Long short-term memory
      5. Bi-directional RNNs
        1. Shortfalls of RNNs
        2. Solutions to overcome
      6. Distributed deep RNNs
      7. RNNs with Deeplearning4j
      8. Summary
    13. 5. Restricted Boltzmann Machines
      1. Energy-based models
      2. Boltzmann machines
        1. How Boltzmann machines learn
        2. Shortfall
      3. Restricted Boltzmann machine
        1. The basic architecture
        2. How RBMs work
      4. Convolutional Restricted Boltzmann machines
        1. Stacked Convolutional Restricted Boltzmann machines
      5. Deep Belief networks
        1. Greedy layer-wise training
      6. Distributed Deep Belief network
        1. Distributed training of Restricted Boltzmann machines
        2. Distributed training of Deep Belief networks
          1. Distributed back propagation algorithm
          2. Performance evaluation of RBMs and DBNs
            1. Drastic improvement in training time
      7. Implementation using Deeplearning4j
        1. Restricted Boltzmann machines
        2. Deep Belief networks
      8. Summary
    14. 6. Autoencoders
      1. Autoencoder
        1. Regularized autoencoders
      2. Sparse autoencoders
        1. Sparse coding
        2. Sparse autoencoders
          1. The k-Sparse autoencoder
            1. How to select the sparsity level k
            2. Effect of sparsity level
      3. Deep autoencoders
        1. Training of deep autoencoders
        2. Implementation of deep autoencoders using Deeplearning4j
      4. Denoising autoencoder
        1. Architecture of a Denoising autoencoder
        2. Stacked denoising autoencoders
        3. Implementation of a stacked denoising autoencoder using Deeplearning4j
      5. Applications of autoencoders
      6. Summary
    15. 7. Miscellaneous Deep Learning Operations using Hadoop
      1. Distributed video decoding in Hadoop
      2. Large-scale image processing using Hadoop
        1. Application of Map-Reduce jobs
      3. Natural language processing using Hadoop
        1. Web crawler
        2. Extraction of keyword and module for natural language processing
        3. Estimation of relevant keywords from a page
      4. Summary
    16. 1. References