[Halld-offline] Slurm jobs
Alexander Austregesilo
aaustreg at jlab.org
Mon Feb 11 15:47:54 EST 2019
Dear Colleagues,
Recent changes in the JLab batch farm system required an update to the
launch scripts which were introduced during the last GlueX software
tutorial:
https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=3658
In particular, you have to svn update the following file:
https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/launch/launch.py
The update will make sure that the directory for the log files exists
prior to registering the job to swif, which is a requirement for the new
scheduler slurm.
Please find more information about the changes below.
Best regards,
Alex
On 2/11/2019 3:44 PM, Ying Chen wrote:
> Dear halld users,
>
> Sorry for this later announcement on the change for our farm cluster.
>
> Since last week, we have quietly moved 20% swif jobs to our new slurm
> cluster.
> Some of the jobs failed on slurm due to the log directory doesn't exist.
> The error message will like this:
>
> Status: FAILED (Job failed due to invalid stdour/stderr)
> This is because auge-slurm will not manage (save, copy and trim the
> log files)
> the stdout and stderr for the job, instead of pass the location to
> slurm so
> slurm will direct write the meesage to these files. This changes will
> have a
> good benefit because the log files will be available to user after job
> starts.
>
> By default, the job .out and .err files will under
> /lustre/expphy/farm_out/<user>,
> name pattern will be JOB_NAME-AUGER_JOB_ID-HOSTNAME.out and
> JOB_NAME-AUGER_JOB_ID-HOSTNAME.err. Auger will create this directory
> for any user who first time run a job on slurm. By if you use <Stdout>
> and
> <Stderr> tag to change the log file location, it is user's
> responsibility to make
> sure the log directory exists, otherwise the job will fail
> with'invalid stdour/stderr'.
>
> Please check this document for detailed information on auger-slurm.
>
> https://scicomp.jlab.org/docs/auger_slurm
> Auger-Slurm (Beta test) | JLab Scientific Computing
> <https://scicomp.jlab.org/docs/auger_slurm>
> The auger-slurm commands are installed under
> /site/scicomp/auger-slurm/bin (will be linked to /site/bin in the
> future) and can be accessed from ifarm or any host ...
> scicomp.jlab.org
>
>
> Let me (or Chris) know if you have any questions.
>
> Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20190211/120f6c7a/attachment.html>
More information about the Halld-offline
mailing list