[Hallcsw] [Fwd: solution to batch farm memory problems]

Thu Jul 23 20:43:46 EDT 2009

Below is a meesage from Andrew Puckett which explains how to modify 
batch farm scripts so that one avoids an "apparent" problem with
memory. Personally I had jobs fail and then upped the requested memory
allocation in the batch jsub file to get the jobs to run. But this 
limited the number of farm machines that one can use. WIth the 
suggestion  below you do not need to request a large memory in the jsub 
file and can use all the farm machines.

                              Mark

-------- Original Message --------
Subject: solution to batch farm memory problems
Date: Thu, 23 Jul 2009 18:00:50 -0400
From: Andrew Puckett <puckett at jlab.org>
To: Mehdi Meziane <mezianem at jlab.org>, Wei Luo <weiluo at jlab.org>, 
  Edward Brash <brash at jlab.org>, Mark Jones <jones at jlab.org>

Michael Barnes from the CC found the solution to this persistent
problem. It turns out that the jobs crashing without a huge memory
request was related to the stack size limit. By adding the following
line to the csh script that executes the engine:

limit stacksize unlimited

the jobs will run on the farm without crashing even if no memory request
(i.e., the default of 152 MB) is specified. This means our replay can
run on all the farm nodes and get done much faster.

Andrew

-- 
				Mark Jones
				757-269-7733
				jones at jlab.org