[Jlab-scicomp-briefs] Batch Farm Change: PBS to Slurm on June 3rd
bhess at jlab.org
Mon May 20 16:15:50 EDT 2019
The farm has been gradually migrating to the Slurm workload manager over the past few months, replacing the legacy PBS system. We are planning the final transition of farm nodes for Tuesday, June 3rd. For most users this will not be a significant change because the primary submission mechanism for the farm will continue to be Auger or SWIF.
There are a few differences in the way Slurm behaves that surfaced during this transition when some SWIF workflows were redirected to Slurm. Most notably, Slurm directs standard output and standard error to volatile space where it will be automatically cleaned after 2 months. Very large stdout/stderr may be cleaned up after two weeks. Job stdout/stderr is visible on hosts mounting Lustre at /farm_out/<username>.
The Slurm farm nodes are currently running CentOS 7.2. They will be updated to 7.6 once the Slurm migration has been completed. This will be a subsequent change announced in advance.
To address questions about this change and the farm, we are scheduling an informal discussion opportunity for next Tuesday, May 28th at 10:00am in CEBAF Center room F224. Incident reports are also welcome for specific issues.
Detailed documentation about submitting to Slurm today is available at https://scicomp.jlab.org/docs/auger_slurm
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Jlab-scicomp-briefs