[Halld-offline] {SPAM?} memory usage for bggen events

Mon Feb 10 06:01:31 EST 2014

Hello all,

While nobody wants to see the cost for MC generation go up relative to what
we saw in the first data challenge, I want to emphasize that we cannot
simply wish away the background.  It is going to be a feature of the real
data that we need to analyze, and to be ready for that we need to generate
events with realistic background.  I would urge us to do generation this
time around with a realistic gate and bg rate for what we expect to see in
phase-1 physics running, and take whatever hit we need to in terms of the
number of events we can simulate.  I cannot think of any study that is
going to fail because we generated a factor 2-3 fewer events, but there are
plenty of things that can go wrong if we fail to simulate realistic
backgrounds in the detector.  I see no advantage to generating even a part
of the DC sample with less than 10^7 g/s background rates and the full 1us
gate.

With more time and effort, we could probably come up with a way to generate
a library of background events and overlay random entries from the library
onto the bggen events at the desired rate.  This is not trivial to do
because the superposition must be carried out at the hits accumulation
level inside Geant before hits aggregation, but given the computing cost it
is likely the way to go.  Meanwhile I think we should press ahead with the
DC and take whatever hit we need to in cpu to get where we need to be.
 Remember that during DC #1 our processing efficiency was only about 50% on
the grid overall, so if we get that number up to near 95% we can process
with background turned on at about the same rate as we did then.

-Richard J.

On Sun, Feb 9, 2014 at 11:49 PM, Kei Moriya <kmoriya at indiana.edu> wrote:

>
> To follow up on the discussion at the Data Challenge meeting
> regarding memory usage for bggen jobs, I ran over 10 runs for
> each configuration of EM background level and had the cluster report
> back the maximum vmem used (see attached plot).
>
> Note that the spikes in max vmem happened most frequently for
> standard EM background (the dashed lines are the average for
> each EM background type, but they are influenced strongly by
> the outliers), while the set with high rate and long gate time
> has no large spikes.
>
> The jobs were set up so that for each run, the original thrown hddm
> file was the same for each EM background set. The RNDM variable
> specified in control.in for hdgeant was always the same. Does
> anybody know if the simulation of the detector hits is done
> independent of the EM background, or if the EM background offsets
> the random number sequence for each hit?
>
> I also monitored the time it took for each job type to finish.
> The results are
> type        time (hrs)
> no EM bg    1 - 1.5
> standard    2 - 3.5
> high rate   5 - 5.5
> long gate   3 - 3.5
> high rate, long gate  10 - 15
>
> Most of the time is being spent on hdgeant. I don't know why
> some jobs take 50% more time than others. The increase in
> processing time is a much larger penalty than the small increase
> we see in the final REST file sizes.
>
> I've also started looking into the number of tracks and track quality,
> and also photon timing information.
>
>         Kei
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-offline/attachments/20140210/05b0e77d/attachment.html