[Jlab-scicomp-briefs] Upcoming Scientific Computing Maintenance

Bryan Hess bhess at jlab.org
Mon Nov 7 16:24:40 EST 2022


Upcoming Scientific Computing Maintenance  - November 2022


Farm Memory Enforcement


On November 15, 2022 a change will be made to the way that Slurm enforces farm memory allocations. The Slurm memory specification (example: #SBATCH --mem) allocates real memory. The current configuration allows an overrun up to 150% of the allocation.  Starting on maintenance day, the hard limit be reduced to the requested allocation without an overrun.


This change is being made to eliminate memory swapping, which severely degrades system performance for all jobs on the associated node. Recent instances of such memory overcommitment have caused jobs to exceed their expected execution times and under-use available CPU resources.


If you are unsure of your job’s memory needs, submit a test job with email notification enabled, (#SBATCH --mail-type=END) or use the “seff” command on a recently completed job, seff <Jobid>. The email or seff output will show the peak memory use of your job and help you to tune it.


Choose a memory allocation that is as low as possible for your job to run. This will help to schedule efficiently and increase throughput. For help and documentation, see the Scientific Computing User Portal at https://jlab.servicenowservices.com/scicomp


Routine Patching


Routine software updates of servers, including file servers, will be applied on maintenance day. Work file servers will have short interruptions for kernel updates. OSG Servers and services will receive the latest security patches. No Lustre changes are planned this month.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/attachments/20221107/4b527dd8/attachment.html>


More information about the Jlab-scicomp-briefs mailing list