[Hallcsw] [Fwd: solution to batch farm memory problems]
Mark Jones
jones at jlab.org
Thu Jul 23 20:43:46 EDT 2009
Below is a meesage from Andrew Puckett which explains how to modify
batch farm scripts so that one avoids an "apparent" problem with
memory. Personally I had jobs fail and then upped the requested memory
allocation in the batch jsub file to get the jobs to run. But this
limited the number of farm machines that one can use. WIth the
suggestion below you do not need to request a large memory in the jsub
file and can use all the farm machines.
Mark
-------- Original Message --------
Subject: solution to batch farm memory problems
Date: Thu, 23 Jul 2009 18:00:50 -0400
From: Andrew Puckett <puckett at jlab.org>
To: Mehdi Meziane <mezianem at jlab.org>, Wei Luo <weiluo at jlab.org>,
Edward Brash <brash at jlab.org>, Mark Jones <jones at jlab.org>
Michael Barnes from the CC found the solution to this persistent
problem. It turns out that the jobs crashing without a huge memory
request was related to the stack size limit. By adding the following
line to the csh script that executes the engine:
limit stacksize unlimited
the jobs will run on the farm without crashing even if no memory request
(i.e., the default of 152 MB) is specified. This means our replay can
run on all the farm nodes and get done much faster.
Andrew
--
Mark Jones
757-269-7733
jones at jlab.org
More information about the Hallcsw
mailing list