[Halld-offline] updated branch again: sim-recon-dc-2

Tue Feb 25 14:29:27 EST 2014

Gracefully punting: there is a way to set status bit information for the event in JANA which we could eventually save to REST, but this isn't being used yet.  

When you include the EM background, especially at high rates, the jobs go SIGNIFICANTLY slower.  So if you want to keep it around 24 hrs with EM background enabled, it's going to have to be much less than 50k events.  For the multiplier on timing, see Kei's slide 2:

https://halldweb1.jlab.org/wiki/images/6/6e/2014-02-14-DC2-Moriya.pdf

At CMU 50k events was taking over 36 hrs with the following (are these correct Richard?): 

BGRATE 5.50
BGGATE -200. 800

so I just killed the jobs.  I'm going to resubmit but with 10k events and see how long it takes.  Is there any way to speed this up?  

 - Paul

On Feb 25, 2014, at 9:42 AM, Mark Ito wrote:

> Justin, et al.,
> 
> Overnight Sunday to Monday, using the new tag I saw similar results. 
> 1,000 jobs, 50 k events each, no crashes. This time I did increase the 
> memory request to 6 GB from 3 GB and the time request to 24 hours from 
> 15 hours. Still no EM background. But about 60 of them had short REST 
> files (there are some ambiguous cases very close to the nominal file 
> length). The short REST file thing seems to be the next frontier.
> 
> I am not counting the DTrackFitterKalmanSIMD.cc message as a crash 
> anymore. Simon tells me that is more like a warning.
> 
> One question I have is if there is some standard way to gracefully punt 
> on an event. We should keep careful accounting of that in each job of 
> course. On the other hand, there is this: 
> http://grantland.com/features/grantland-channel-coach-never-punts/
> 
> On the length of a job, I would prefer to go to 24-hour jobs (assuming 
> that there is a high probability of them finishing of course). Fewer 
> output files that way. My understanding is that on the grid this is too 
> long when we are in opportunistic mode. The longer you run in that case, 
> the higher your chance of being pre-empted. Perhaps we should have 
> different run times at the different sites.
> 
>   -- Mark
> 
> On 02/25/2014 08:59 AM, Justin Stevens wrote:
>> Hi Data Challengers,
>> 
>> FYI, I used these tagged versions for a test on the MIT Openstack cluster of 100 jobs completed with 25K events each using the "standard" EM background that Kei and Paul have also used (10^7 g/s and +/- 200 ns gate).  With this configuration I don't see any crashes, but 2 jobs finish with a small REST output file size which only contains a fraction of the 25K events.  I think Paul also found this earlier where JANA reports that it processed the full number of events, but the REST file size is too small.  As a reminder these jobs are deployed with 8 parallel processes running on an 8-core blade which share 14GB of memory.  So they're not as sensitive to the memory spikes as some of the other tests we've seen.
>> 
>> One other thought on job length.  Since the EM background results in much more time spent in hdgeant, should we lower the number of events per job to keep the job length from getting too large?  These 25K event jobs take 6 hours here, but this will only go up if we extend the gate or up the rate for the EM background.
>> 
>> -Justin
>> 
>> On Feb 23, 2014, at 6:01 PM, Mark Ito wrote:
>> 
>>> correction, see below
>>> On 02/22/2014 09:18 PM, Mark Ito wrote:
>>>> Folks,
>>>> 
>>>> I changed the data challenge 2 sim-recon and hdds branches to get the
>>>> latest versions of everything, this time including src/libraries/TOF.
>>>> This will bring in Paul's recent changes to ameliorate the memory issues
>>>> we have been fighting with. The new hdds is necessary for the new TOF
>>>> geometry.
>>>> 
>>>> New tagged versions reflect the branch a week ago:
>>>> 
>>>> tags/hdds-dc-2.0
>>>> tags/hdds-dc-2.1
>>> that last line should have been: tags/sim-recon-dc-2.1
>>>> As before, the branch has not been tested yet, but the word should go
>>>> out whenever it changes. Let me know if you see weirdness.
>>>> 
>>>>    -- Mark
>>>> 
>>> _______________________________________________
>>> Halld-offline mailing list
>>> Halld-offline at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/halld-offline
> 
> -- 
> Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline