[Halld-offline] Data Challenge Meeting Minutes, February 14, 2014

Mon Feb 17 12:29:54 EST 2014

Folks,

Please find the minutes below and at

https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_February_14,_2014#Minutes 
.

-- Mark
________________________________________________

GlueX Data Challenge Meeting, February 14, 2014
Minutes

Present:
* CMU: Paul Mattione, Curtis Meyer
* FSU: Volker Crede, Aristeidis Tsaris
* IU: Kei Moriya
* JLab: Mark Ito (chair), Chris Larrieu, Simon Taylor
* NWU: Sean Dobbs

Update on phi=0 geometry issues in CDC

Richard checked in a [21]change. Simon and Beni confirm that this fixes
the issue.

Random number seeds procedure

Still nothing new to report.

Running jobs at CMU

Paul has continued running jobs. Last time he reported a third of jobs
crashing. Since then he increased the memory allowed from 2 GB to 5 GB.
Now the crash rate is down to 5%.

Simon has been running Valgrind, but he sees a lot of issues being
flagged in xerces, which is likely not our problem.

Paul has also been running hd_root withing the time command and is
getting memory usage varying between 3 and 5 GB.

Running jobs at NU Sean

Sean has been seeing start up problems with hd_root, which seems to be
correlated with whether the input and/or output files are on network
disk or on the local machine. In particular, he sees crashes when
reading in the geometry. This has not been seen at other sites,
although the exact configuration which is giving him problems may not
have been tried much. He is tracking down the error.

Running jobs at MIT

Justin has succeeded in running 100 jobs with 10 k events each, no EM
background, at MIT on the OpenStack Cluster. He sees only 2 failures,
those in DTrackFitterKalmanSIMD.

Running jobs at JLab

Mark [22]updated the DC2 tag to bring in Simon's latest changes to
tracking, as well as Richard's fix at φ=0 in the CDC. Mark ran 400
50-k-event jobs with same configuration as last week, except that the
requested memory was doubled, at 3.0 GB up from 1.5 GB. He saw a
[23]much improved success rate, 84% success vs. 26% reported last week.
A lot of the improvement was a reduced rate of breaching the memory
limit, but not all. Two percent of the jobs failed because they
exceeded 15 hours of wall time. At least one successful job took only 9
hours to finish.

Mark has also started to run very short jobs, to help Simon debug the
tracking crashes. He ran 10,000 jobs of 1000 events each last weekend,
and 4,000 jobs of 500 events each with the new DC2 version. For these,
he is keeping the smeared data that is input to hd_root. Simon was able
to use last weekend's jobs to find the errors discovered during the
week.

Electromagnetic Backgrounds update

Kei gave us an update on this performance comparisons with and without
electromagnetic background. See [24]his slides for details. He compares
memory usage and execution time for various combinations of background
time gate and beam intensity. He also showed the effect on
reconstructed quantities in slides titled as follows:
* Increase in Showers
* FCAL Shower time vs. energy
* Projections of Time and Energy of FCAL Showers
* FCAL Shower Time vs. Distance from Target
* FCAL Energy Showers After Timing Cut
* BCAL
* Generated E, p
* Generated p vs. θ
* Tracks

[Added in press: Kei sent out an [25]email directing us to an
[26]update on this report, answering some questions raised at this
meeting.]

Check on event genealogy

During his studies on EM background Kei noticed that the energy and
momentum of the sum of all primary particles (generated E, p, above) is
now much more sensible than before. There are still some low-level
tails that may have to do with particles re-entering the detector from
the calorimeters and getting mis-classified.

SRM capability

We started to discuss this issue, but decided to postpone a full
discussion until the collaboration meeting.

Next Meeting

We will not meet next week due to the collaboration meeting, but will
start up the week after. The main task for us to to continue tracking
down the cause of job crashes.

Retrieved from 
"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_February_14,_2014"

References

21. 
https://mailman.jlab.org/pipermail/halld-offline/2014-February/001483.html
22. 
https://mailman.jlab.org/pipermail/halld-offline/2014-February/001487.html
23. https://halldweb1.jlab.org/wiki/index.php/Quick_Look_at_DC_2
24. https://halldweb1.jlab.org/wiki/images/6/6e/2014-02-14-DC2-Moriya.pdf
25. 
https://mailman.jlab.org/pipermail/halld-offline/2014-February/001491.html
26. https://halldweb1.jlab.org/wiki/images/1/10/2014-02-17-dc2update.pdf

-- 
Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295