[Hps] [Jlab-scicomp-briefs] Upcoming Farm configuration changes

Bryan Hess bhess at jlab.org
Mon Jul 27 14:05:46 EDT 2020


Slurm Changes

Tomorrow morning, Tuesday July 28,  two changes will be made to Slurm, the batch farm scheduler:

  *   Memory limits on farm jobs will be enforced using Linux Control Groups (cgroups). This change is being made to improve farm and Lustre stability by killing jobs that exceed their memory request. Jobs that are killed because of a memory constraint will show up in slurm as "JobState=OUT_OF_MEMORY Reason=OutOfMemory"
  *   Node allocation policy will switch to a method that coalesces contiguous job slots for large multi-threaded jobs. This is a small change to node allocation that helps with some job mixes, not a change to priority or fairshare scheduling.


August 18: Farm QCD12S node end of service

The end-of-life qcd12s nodes will be removed from service on August 18, 2020 as part of regular maintenance. Any jobs that are still running using the "centos7" tag will not run after that day. Please use "centos77" or "general".





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/hps/attachments/20200727/9abddc81/attachment.html>
-------------- next part --------------
_______________________________________________
Jlab-scicomp-briefs mailing list
Jlab-scicomp-briefs at jlab.org
https://mailman.jlab.org/mailman/listinfo/jlab-scicomp-briefs


More information about the Hps mailing list