April 2016
Intermediate to advanced
170 pages
3h 48m
English
Everything is great when things work as we expect them to; oftentimes, however, we are not so lucky. Distributed applications, and even simple jobs running remotely, are particularly challenging to debug. It is usually hard to know exactly which user account our jobs run under, which environment they are executed in, where they run, and, with job schedulers, it is even hard to predict when they will run.
When things do not work as we expect them to, there are a few places where we could get some hints as to what happened. When working with a job scheduler, the first thing to do is look at any error messages returned by the job submission tool (that is, condor_submit, condor_submit_dag, or qsub). The second place to look for clues are ...