[Clara] Overlimit Error Farm

Justin Ruger jruger at jlab.org
Wed Jan 15 09:47:45 EST 2014


That is the yaml file.  Reading and writing is not in yaml


Sent from my Samsung Galaxy S®4

-------- Original message --------
From: Dennis Weygand <weygand at jlab.org> 
Date:01/15/2014  9:39 AM  (GMT-05:00) 
To: clara at jlab.org 
Subject: Re: [Clara] Overlimit Error Farm 

No- there must be at least two other services in the chain: one for reading and one for writing: can you show the yaml file?
Dennis

On Jan 15, 2014, at 9:22 AM, Justin Ruger wrote:

these are the services i am using

# Uncomment the line below to set a new container name for all services
# container: Ruger_Reconstruction
services:
 - class: org.jlab.clas12.ec.services.ECReconstruction
   name: ECReconstruction
 - class: org.jlab.clas12.ctof.services.CTOFReconstruction
   name: CTOFReconstruction
 - class: trac.services.centraltracker.BSTTrackCandFinder
   name: BSTTrackCandFinder
 - class: trac.services.forwardtracking.DCTrackCandFinder
   name: DCTrackCandFinder



On Wednesday, January 15, 2014 9:19:07 AM, Vardan Gyurjyan wrote:
This is most likely an error of the service that is managing shared-memory file cashing, and is nothing to do with the framework. It seems that that service is not removing files from the shared memory after the chain is done processing a file.
-vardan


----- Original Message -----
From: "Justin Ruger" <jruger at jlab.org>
To: clara at jlab.org
Sent: Wednesday, January 15, 2014 9:12:43 AM
Subject: [Clara] Overlimit Error Farm

So one of the reasons our farm jobs for clara dpe keep getting canceled
is because of this error:

=>> PBS: job killed: vmem 29235798016 exceeded limit 28991029248

This happens with or without -l flag.

So how to recreate the error:

I have 20k event files. I said I wanted to do processing on 50 20k
files. I wrote a script that caches the files using jcache and then adds
the file name to the input.list so that it is all mandatory.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
#!/bin/bash
for((i=1;i<=$1;i+=1)); do
         echo "/cache/mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev";
         if [ $i == 1 ]
                 then
                         echo "sidis_$i.ev" > 'input.list';
                 else
                         echo "sidis_$i.ev" >> 'input.list';
         fi
         jcache submit default
/mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev
done
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Then i ran the farm orch like:

java -cp "$CLARA_SERVICES/.:$CLARA_SERVICES/lib/*"
std.orchestrators.FarmOrchestrator -i
/cache/mss/clas/clas12/clas12-testing/gemc -o
/w/hallb/clas12/jruger/fiftyNode/output -s /tmp
/w/hallb/clas12/jruger/stress_test/services.yaml input.list

It ran fine for 5 files: so sidis_1.ev to sidis_5.ev but gets the job
canceled while processing the 6th file.

I think this is something we should figure out how to solve ASAP while
they are allowing us to hold a node for development. If all I can do is
five files at a time this limits the robustness of clara in my opinion.
Let me know if you need anymore information.

Justin
_______________________________________________
Clara mailing list
Clara at jlab.org
https://mailman.jlab.org/mailman/listinfo/clara
_______________________________________________
Clara mailing list
Clara at jlab.org
https://mailman.jlab.org/mailman/listinfo/clara

--
Dennis Weygand
weygand at jlab.org
(757) 269-5926 (office)
(757) 870-4844 (cell)




-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/clara/attachments/20140115/86268624/attachment-0001.html 


More information about the Clara mailing list