[Clara] Overlimit Error Farm
Dennis Weygand
weygand at jlab.org
Wed Jan 15 09:39:02 EST 2014
No- there must be at least two other services in the chain: one for reading and one for writing: can you show the yaml file?
Dennis
On Jan 15, 2014, at 9:22 AM, Justin Ruger wrote:
> these are the services i am using
>
> # Uncomment the line below to set a new container name for all services
> # container: Ruger_Reconstruction
> services:
> - class: org.jlab.clas12.ec.services.ECReconstruction
> name: ECReconstruction
> - class: org.jlab.clas12.ctof.services.CTOFReconstruction
> name: CTOFReconstruction
> - class: trac.services.centraltracker.BSTTrackCandFinder
> name: BSTTrackCandFinder
> - class: trac.services.forwardtracking.DCTrackCandFinder
> name: DCTrackCandFinder
>
>
>
> On Wednesday, January 15, 2014 9:19:07 AM, Vardan Gyurjyan wrote:
>> This is most likely an error of the service that is managing shared-memory file cashing, and is nothing to do with the framework. It seems that that service is not removing files from the shared memory after the chain is done processing a file.
>> -vardan
>>
>>
>> ----- Original Message -----
>> From: "Justin Ruger" <jruger at jlab.org>
>> To: clara at jlab.org
>> Sent: Wednesday, January 15, 2014 9:12:43 AM
>> Subject: [Clara] Overlimit Error Farm
>>
>> So one of the reasons our farm jobs for clara dpe keep getting canceled
>> is because of this error:
>>
>> =>> PBS: job killed: vmem 29235798016 exceeded limit 28991029248
>>
>> This happens with or without -l flag.
>>
>> So how to recreate the error:
>>
>> I have 20k event files. I said I wanted to do processing on 50 20k
>> files. I wrote a script that caches the files using jcache and then adds
>> the file name to the input.list so that it is all mandatory.
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
>> #!/bin/bash
>> for((i=1;i<=$1;i+=1)); do
>> echo "/cache/mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev";
>> if [ $i == 1 ]
>> then
>> echo "sidis_$i.ev" > 'input.list';
>> else
>> echo "sidis_$i.ev" >> 'input.list';
>> fi
>> jcache submit default
>> /mss/clas/clas12/clas12-testing/gemc/sidis_$i.ev
>> done
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>> Then i ran the farm orch like:
>>
>> java -cp "$CLARA_SERVICES/.:$CLARA_SERVICES/lib/*"
>> std.orchestrators.FarmOrchestrator -i
>> /cache/mss/clas/clas12/clas12-testing/gemc -o
>> /w/hallb/clas12/jruger/fiftyNode/output -s /tmp
>> /w/hallb/clas12/jruger/stress_test/services.yaml input.list
>>
>> It ran fine for 5 files: so sidis_1.ev to sidis_5.ev but gets the job
>> canceled while processing the 6th file.
>>
>> I think this is something we should figure out how to solve ASAP while
>> they are allowing us to hold a node for development. If all I can do is
>> five files at a time this limits the robustness of clara in my opinion.
>> Let me know if you need anymore information.
>>
>> Justin
>> _______________________________________________
>> Clara mailing list
>> Clara at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clara
> _______________________________________________
> Clara mailing list
> Clara at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clara
--
Dennis Weygand
weygand at jlab.org
(757) 269-5926 (office)
(757) 870-4844 (cell)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/clara/attachments/20140115/1266b5a9/attachment.html
More information about the Clara
mailing list