[Clas12_software] jlab batch job memory requests
Vardan Gyurjyan
gurjyan at jlab.org
Wed Feb 8 07:59:23 EST 2023
Hi Nathan,
I wanted to know if there were changes in the memory footprint of the reconstruction application and if SLURM is reacting to resident memory and not the virtual memory. I have to say that your previous email answered my questions.
-Vardan
--------------------------------------------------
Vardan H. Gyurjyan, Ph.D.
Staff Scientist
Thomas Jefferson Accelerator Facility
Newport News, VA, 23606
E-mail: gurjyan at jlab.org
757-269-5879 (JLAB)
From: Nathan Baltzell <baltzell at jlab.org>
Date: Tuesday, February 7, 2023 at 9:37 PM
To: Vardan Gyurjyan <gurjyan at jlab.org>
Cc: clas12 software <clas12_software at jlab.org>
Subject: Re: [Clas12_software] jlab batch job memory requests
And for exclusive jobs, the memory request is moot anyway. What are you really after here?
On Feb 7, 2023, at 11:14 AM, Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>> wrote:
Hi Vardan,
What is the "reconstruction application"? The number I quoted was for standard simulation jobs. The larger jobs used for real data of course depend on the size of the job, but never more than 1 GB per thread.
-Nathan
On Feb 7, 2023, at 11:11 AM, Vardan Gyurjyan <gurjyan at jlab.org<mailto:gurjyan at jlab.org>> wrote:
What is the reconstruction application's resident (not the virtual) memory usage for the exclusive usage of a node?
--------------------------------------------------
Vardan H. Gyurjyan, Ph.D.
Staff Scientist
Thomas Jefferson Accelerator Facility
Newport News, VA, 23606
E-mail: gurjyan at jlab.org<mailto:gurjyan at jlab.org>
757-269-5879 (JLAB)
From: Clas12_software <clas12_software-bounces at jlab.org<mailto:clas12_software-bounces at jlab.org>> on behalf of Nathan Baltzell via Clas12_software <clas12_software at jlab.org<mailto:clas12_software at jlab.org>>
Reply-To: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Date: Tuesday, February 7, 2023 at 11:05 AM
To: clas12 software <clas12_software at jlab.org<mailto:clas12_software at jlab.org>>
Subject: [Clas12_software] jlab batch job memory requests
FYI Everyone,
We've brought up memory efficiency of batch job requests at CLAS12 collaboration and software meetings in previous years. Lots of jobs requesting a lot more memory than they actually use can make the farm unnecessarily idle and significantly reduce throughput for everyone.
*** And now Scicomp has a larger initiative to improve farm efficiency, which includes contacting people running memory-inefficient jobs and potentially throttling their jobs if no action is taken. ***
You can check metrics of your batch jobs at:
https://scicomp.jlab.org<https://scicomp.jlab.org/>
There's a search feature at 'Slurm Jobs' (left sidebar) -> 'Jobs Query' (top), and 'Recent Jobs' (top), and also 'Memory Efficiency' (top).
Before launching a large number of new types of jobs, you can measure how much memory your jobs use. For example, by submitting a couple jobs and using that website, or by running your job interactively and checking in htop or ps or other system utilities. And then set your SLURM/SWIF job memory request accordingly.
Note, standard CLAS12 simulation jobs (gemc plus recon-util) require less than 1.7 GB of memory.
-Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20230208/d1e709d3/attachment-0001.html>
More information about the Clas12_software
mailing list