[Halld-offline] Slurm jobs

Alexander Austregesilo aaustreg at jlab.org
Mon Feb 11 15:47:54 EST 2019


Dear Colleagues,


Recent changes in the JLab batch farm system required an update to the 
launch scripts which were introduced during the last GlueX software 
tutorial:


https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=3658


In particular, you have to svn update the following file:


https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/launch/launch.py


The update will make sure that the directory for the log files exists 
prior to registering the job to swif, which is a requirement for the new 
scheduler slurm.


Please find more information about the changes below.


Best regards,

Alex



On 2/11/2019 3:44 PM, Ying Chen wrote:
> Dear halld users,
>
> Sorry for this later announcement on the change for our farm cluster.
>
> Since last week, we have quietly moved 20% swif jobs to our new slurm 
> cluster.
> Some of the jobs failed on slurm due to the log directory doesn't exist.
> The error message will like this:
>
> Status: FAILED (Job failed due to invalid stdour/stderr)
> This is because auge-slurm will not manage (save, copy and trim the 
> log files)
> the stdout and stderr for the job, instead of pass the location to 
> slurm so
> slurm will direct write the meesage to these files. This changes will 
> have a
> good benefit because the log files will be available to user after job 
> starts.
>
> By default, the job .out and .err files will under 
> /lustre/expphy/farm_out/<user>,
> name pattern will be JOB_NAME-AUGER_JOB_ID-HOSTNAME.out and
> JOB_NAME-AUGER_JOB_ID-HOSTNAME.err.  Auger will create this directory
> for any user who first time run a job on slurm. By if you use <Stdout> 
> and
> <Stderr> tag to change the log file location, it is user's 
> responsibility to make
> sure the log directory exists, otherwise the job will fail 
> with'invalid stdour/stderr'.
>
> Please check this document for detailed information on auger-slurm.
>
> https://scicomp.jlab.org/docs/auger_slurm
> Auger-Slurm (Beta test) | JLab Scientific Computing 
> <https://scicomp.jlab.org/docs/auger_slurm>
> The auger-slurm commands are installed under 
> /site/scicomp/auger-slurm/bin (will be linked to /site/bin in the 
> future) and can be accessed from ifarm or any host ...
> scicomp.jlab.org
>
>
> Let me (or Chris) know if you have any questions.
>
> Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20190211/120f6c7a/attachment.html>


More information about the Halld-offline mailing list