Chapter 9. Using Deep Learning and DL4J on Spark
Ten years on the road, making one night stand Speeding my young life away Tell me one more time just so I’ll understand Are you sure Hank done it this way? Did old Hank really do it this way?
Waylon Jennings, “Are You Sure Hank Done It This Way”
Introduction to Using DL4J with Spark and Hadoop
Two key datacenter technologies that have emerged in the past decade are Apache Hadoop and Apache Spark. Hadoop in particular has become the epicenter of data warehouse growth and evolution. Spark has succeeded MapReduce to become the mainline execution framework on Hadoop for running parallel iterative algorithms.
DL4J supports scale-out of network training on Spark. We can use Spark execution for DL4J to significantly reduce the time required to train our networks. This scenario also gives us the option to mitigate increased training time as input size grows.
To the Cloud!
Platforms such as Amazon Web Services (AWS), Google Cloud, and Microsoft Azure makes it possible allow people to set up a Spark cluster on demand, for just a few dollars. DL4J is able to run on most public cloud infrastructure,1 giving practitioners flexibility in how and where they run their deep learning workflows.
Spark is a general parallel-processing engine that can execute on its own, on an Apache Mesos cluster, or on a Hadoop cluster via the Hadoop YARN (Yet Another Resource Negotiator) framework. It can work with data in the Hadoop Distributed File System ...