Debugging Spark applications on YARN or Mesos cluster

When you run a Spark application on YARN, there is an option that you can enable by modifying yarn-env.sh:

YARN_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=4000 $YARN_OPTS"

Now, the remote debugging will be available through port 4000 on your Eclipse or IntelliJ IDE. The second option is by setting the SPARK_SUBMIT_OPTS. You can use either Eclipse or IntelliJ to develop your Spark applications that can be submitted to be executed on remote multinode YARN clusters. What I do is that I create a Maven project on Eclipse or IntelliJ and package my Java or Scala application as a jar file and then submit it as a Spark job. However, in order to attach your IDE such as ...

Get Apache Spark 2: Data Processing and Real-Time Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.