[Halld-offline] updated branch again: sim-recon-dc-2

Sean Dobbs s-dobbs at northwestern.edu
Wed Feb 26 11:56:45 EST 2014


Hi Data Challengers,

For what it's worth, I'm getting job times of roughly 9 - 12 hours
including the lowest setting of EM background that Kei tried.  I'll work on
having some more stats for this Friday's meeting.

The success rate is up, so thanks to Paul, Simon, et al. for reducing
memory usage =)  I'm still having too many jobs  crash at the beginning of
REST creation, so I am tracking down the cause of that.



Cheers,
Sean



On Tue, Feb 25, 2014 at 8:42 AM, Mark Ito <marki at jlab.org> wrote:

> Justin, et al.,
>
> Overnight Sunday to Monday, using the new tag I saw similar results.
> 1,000 jobs, 50 k events each, no crashes. This time I did increase the
> memory request to 6 GB from 3 GB and the time request to 24 hours from
> 15 hours. Still no EM background. But about 60 of them had short REST
> files (there are some ambiguous cases very close to the nominal file
> length). The short REST file thing seems to be the next frontier.
>
> I am not counting the DTrackFitterKalmanSIMD.cc message as a crash
> anymore. Simon tells me that is more like a warning.
>
> One question I have is if there is some standard way to gracefully punt
> on an event. We should keep careful accounting of that in each job of
> course. On the other hand, there is this:
> http://grantland.com/features/grantland-channel-coach-never-punts/
>
> On the length of a job, I would prefer to go to 24-hour jobs (assuming
> that there is a high probability of them finishing of course). Fewer
> output files that way. My understanding is that on the grid this is too
> long when we are in opportunistic mode. The longer you run in that case,
> the higher your chance of being pre-empted. Perhaps we should have
> different run times at the different sites.
>
>    -- Mark
>
> On 02/25/2014 08:59 AM, Justin Stevens wrote:
> > Hi Data Challengers,
> >
> > FYI, I used these tagged versions for a test on the MIT Openstack
> cluster of 100 jobs completed with 25K events each using the "standard" EM
> background that Kei and Paul have also used (10^7 g/s and +/- 200 ns gate).
>  With this configuration I don't see any crashes, but 2 jobs finish with a
> small REST output file size which only contains a fraction of the 25K
> events.  I think Paul also found this earlier where JANA reports that it
> processed the full number of events, but the REST file size is too small.
>  As a reminder these jobs are deployed with 8 parallel processes running on
> an 8-core blade which share 14GB of memory.  So they're not as sensitive to
> the memory spikes as some of the other tests we've seen.
> >
> > One other thought on job length.  Since the EM background results in
> much more time spent in hdgeant, should we lower the number of events per
> job to keep the job length from getting too large?  These 25K event jobs
> take 6 hours here, but this will only go up if we extend the gate or up the
> rate for the EM background.
> >
> > -Justin
> >
> > On Feb 23, 2014, at 6:01 PM, Mark Ito wrote:
> >
> >> correction, see below
> >> On 02/22/2014 09:18 PM, Mark Ito wrote:
> >>> Folks,
> >>>
> >>> I changed the data challenge 2 sim-recon and hdds branches to get the
> >>> latest versions of everything, this time including src/libraries/TOF.
> >>> This will bring in Paul's recent changes to ameliorate the memory
> issues
> >>> we have been fighting with. The new hdds is necessary for the new TOF
> >>> geometry.
> >>>
> >>> New tagged versions reflect the branch a week ago:
> >>>
> >>> tags/hdds-dc-2.0
> >>> tags/hdds-dc-2.1
> >> that last line should have been: tags/sim-recon-dc-2.1
> >>> As before, the branch has not been tested yet, but the word should go
> >>> out whenever it changes. Let me know if you see weirdness.
> >>>
> >>>     -- Mark
> >>>
> >> _______________________________________________
> >> Halld-offline mailing list
> >> Halld-offline at jlab.org
> >> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
> --
> Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
>




-- 
Sean Dobbs
Department of Physics & Astronomy
Northwestern University
phone: 847-467-2826
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-offline/attachments/20140226/27995e36/attachment.html 


More information about the Halld-offline mailing list