book

Distributed Computing with Python

by Francesco Pierfederici

April 2016

Intermediate to advanced

170 pages

3h 48m

English

Packt Publishing

Read now

Unlock full access

Content preview from Distributed Computing with Python

Debugging

Everything is great when things work as we expect them to; oftentimes, however, we are not so lucky. Distributed applications, and even simple jobs running remotely, are particularly challenging to debug. It is usually hard to know exactly which user account our jobs run under, which environment they are executed in, where they run, and, with job schedulers, it is even hard to predict when they will run.

When things do not work as we expect them to, there are a few places where we could get some hints as to what happened. When working with a job scheduler, the first thing to do is look at any error messages returned by the job submission tool (that is, condor_submit, condor_submit_dag, or qsub). The second place to look for clues are ...