[Clas12_software] jlab batch job memory requests

Nathan Baltzell baltzell at jlab.org
Wed Feb 8 21:06:22 EST 2023


Hi Vardan,

The memory footprint of CLAS12 reconstruction jobs is always evolving, e.g., due to software/algorithm changes and new detectors.  And the more general systems at JLab react to physical memory, not virtual.  But both of those have been pretty static for a few years, with no signs of changing any time soon.

The motivation of this email thread was rather to warn our collaborators that they may experience new issues if their resource requests on their particular jobs are deemed inappropriate, and try to provide some guidance on how to measure and address that.

-Nathan

On Feb 8, 2023, at 7:59 AM, Vardan Gyurjyan <gurjyan at jlab.org<mailto:gurjyan at jlab.org>> wrote:

Hi Nathan,

I wanted to know if there were changes in the memory footprint of the reconstruction application and if SLURM is reacting to resident memory and not the virtual memory. I have to say that your previous email answered my questions.

-Vardan

--------------------------------------------------
Vardan H. Gyurjyan, Ph.D.
Staff Scientist
Thomas Jefferson Accelerator Facility
Newport News, VA, 23606
E-mail: gurjyan at jlab.org<mailto:gurjyan at jlab.org>
757-269-5879 (JLAB)

From: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Date: Tuesday, February 7, 2023 at 9:37 PM
To: Vardan Gyurjyan <gurjyan at jlab.org<mailto:gurjyan at jlab.org>>
Cc: clas12 software <clas12_software at jlab.org<mailto:clas12_software at jlab.org>>
Subject: Re: [Clas12_software] jlab batch job memory requests

And for exclusive jobs, the memory request is moot anyway.  What are you really after here?


On Feb 7, 2023, at 11:14 AM, Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>> wrote:

Hi Vardan,

What is the "reconstruction application"?  The number I quoted was for standard simulation jobs.  The larger jobs used for real data of course depend on the size of the job, but never more than 1 GB per thread.

-Nathan



On Feb 7, 2023, at 11:11 AM, Vardan Gyurjyan <gurjyan at jlab.org<mailto:gurjyan at jlab.org>> wrote:

What is the reconstruction application's resident (not the virtual) memory usage for the exclusive usage of a node?

--------------------------------------------------
Vardan H. Gyurjyan, Ph.D.
Staff Scientist
Thomas Jefferson Accelerator Facility
Newport News, VA, 23606
E-mail: gurjyan at jlab.org<mailto:gurjyan at jlab.org>
757-269-5879 (JLAB)

From: Clas12_software <clas12_software-bounces at jlab.org<mailto:clas12_software-bounces at jlab.org>> on behalf of Nathan Baltzell via Clas12_software <clas12_software at jlab.org<mailto:clas12_software at jlab.org>>
Reply-To: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Date: Tuesday, February 7, 2023 at 11:05 AM
To: clas12 software <clas12_software at jlab.org<mailto:clas12_software at jlab.org>>
Subject: [Clas12_software] jlab batch job memory requests

FYI Everyone,

We've brought up memory efficiency of batch job requests at CLAS12 collaboration and software meetings in previous years.  Lots of jobs requesting a lot more memory than they actually use can make the farm unnecessarily idle and significantly reduce throughput for everyone.

*** And now Scicomp has a larger initiative to improve farm efficiency, which includes contacting people running memory-inefficient jobs and potentially throttling their jobs if no action is taken.  ***

You can check metrics of your batch jobs at:

https://scicomp.jlab.org<https://scicomp.jlab.org/>

There's a search feature at 'Slurm Jobs'  (left sidebar) -> 'Jobs Query' (top), and  'Recent Jobs' (top), and also 'Memory Efficiency' (top).

Before launching a large number of new types of jobs, you can measure how much memory your jobs use.  For example, by submitting a couple jobs and using that website, or by running your job interactively and checking in htop or ps or other system utilities.  And then set your SLURM/SWIF job memory request accordingly.

Note, standard CLAS12 simulation jobs (gemc plus recon-util) require less than 1.7 GB of memory.

-Nathan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20230209/1715a70b/attachment.html>


More information about the Clas12_software mailing list