[Clara] Overlimit Error Farm

Vardan Gyurjyan gurjyan at jlab.org
Wed Jan 15 10:15:59 EST 2014


yaml file contains only physics data processing services. Reader, Writer  FileManager , etc. services are considered to be normative services and are started by the orchestrator.
-vardan

----- Original Message -----
From: "Justin Ruger" <jruger at jlab.org>
To: "Dennis Weygand" <weygand at jlab.org>, clara at jlab.org
Sent: Wednesday, January 15, 2014 9:47:45 AM
Subject: Re: [Clara] Overlimit Error Farm



That is the yaml file. Reading and writing is not in yaml 





Sent from my Samsung Galaxy S®4 

-------- Original message -------- 
From: Dennis Weygand 
Date:01/15/2014 9:39 AM (GMT-05:00) 
To: clara at jlab.org 
Subject: Re: [Clara] Overlimit Error Farm 

No- there must be at least two other services in the chain: one for reading and one for writing: can you show the yaml file? 
Dennis 



On Jan 15, 2014, at 9:22 AM, Justin Ruger wrote: 



these are the services i am using 

# Uncomment the line below to set a new container name for all services 
# container: Ruger_Reconstruction 
services: 
- class: org.jlab.clas12.ec.services.ECReconstruction 
name: ECReconstruction 
- class: org.jlab.clas12.ctof.services.CTOFReconstruction 
name: CTOFReconstruction 
- class: trac.services.centraltracker.BSTTrackCandFinder 
name: BSTTrackCandFinder 
- class: trac.services.forwardtracking.DCTrackCandFinder 
name: DCTrackCandFinder 



On Wednesday, January 15, 2014 9:19:07 AM, Vardan Gyurjyan wrote: 


This is most likely an error of the service that is managing shared-memory file cashing, and is nothing to do with the framework. It seems that that service is not removing files from the shared memory after the chain is done processing a file. 


-vardan 








----- Original Message ----- 


From: "Justin Ruger" < jruger at jlab.org > 


To: clara at jlab.org 


Sent: Wednesday, January 15, 2014 9:12:43 AM 


Subject: [Clara] Overlimit Error Farm 





So one of the reasons our farm jobs for clara dpe keep getting canceled 


is because of this error: 





=>> PBS: job killed: vmem 29235798016 exceeded limit 28991029248 





This happens with or without -l flag. 





So how to recreate the error: 





I have 20k event files. I said I wanted to do processing on 50 20k 


files. I wrote a script that caches the files using jcache and then adds 


the file name to the input.list so that it is all mandatory. 





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= 


#!/bin/bash 


for((i=1;i<=$1;i+=1)); do 


echo "/cache/mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev"; 


if [ $i == 1 ] 


then 


echo "sidis_$i.ev" > 'input.list'; 


else 


echo "sidis_$i.ev" >> 'input.list'; 


fi 


jcache submit default 


/mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev 


done 


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 





Then i ran the farm orch like: 





java -cp "$CLARA_SERVICES/.:$CLARA_SERVICES/lib/*" 


std.orchestrators.FarmOrchestrator -i 


/cache/mss/clas/clas12/clas12-testing/gemc -o 


/w/hallb/clas12/jruger/fiftyNode/output -s /tmp 


/w/hallb/clas12/jruger/stress_test/services.yaml input.list 





It ran fine for 5 files: so sidis_1.ev to sidis_5.ev but gets the job 


canceled while processing the 6th file. 





I think this is something we should figure out how to solve ASAP while 


they are allowing us to hold a node for development. If all I can do is 


five files at a time this limits the robustness of clara in my opinion. 


Let me know if you need anymore information. 





Justin 


_______________________________________________ 


Clara mailing list 


Clara at jlab.org 


https://mailman.jlab.org/mailman/listinfo/clara 
_______________________________________________ 
Clara mailing list 
Clara at jlab.org 
https://mailman.jlab.org/mailman/listinfo/clara 




-- 
Dennis Weygand 
weygand at jlab.org 
(757) 269-5926 (office) 
(757) 870-4844 (cell) 





_______________________________________________
Clara mailing list
Clara at jlab.org
https://mailman.jlab.org/mailman/listinfo/clara



More information about the Clara mailing list