[Halld-offline] Data Challenge Meeting Minutes, February 28, 2014

Mark Ito marki at jlab.org
Sun Mar 2 22:02:40 EST 2014


Colleagues,

Please fine the minutes below and at

https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_February_28,_2014#Minutes

-- Mark
_________________________________________________________

GlueX Data Challenge Meeting, February 28, 2014
Minutes

Present:
* CMU: Paul Mattione, Curtis Meyer
* FSU: Volker Crede, Priyashree Roy, Aristeidis Tsaris,
* IU: Kei Moriya
* JLab: Mark Dalton, Mark Ito (chair), Chris Larrieu, Simon Taylor
* NU: Sean Dobbs
* UConn: Richard Jones

Announcements

* Mark announced an [21]update of the branch. Changes include:
1. I fix from Simon for single-ended TOF counters.
2. Improvements from Paul for cutting off processing for
multi-lap curling tracks.
3. A change from David Lawrence to [22]allow compression to be
turned off in producing REST format data.
o David noticed that all three of the programs hdgeant,
mcsmear, and DANA produced HDDM-like output, but only
DANA has compression turned on (REST data in this case).
This feature will allow us to test if this has anything
to do with short REST files. On a side note, David
reported that the short-REST-file was not reproducible.
Mark produced some example hdgeant_smeared.hddm files
that produced short output for him to test.

Running Jobs at JLab

Mark has submitted some test jobs against the new branch. [Added in
press: 1,000 50 k-event jobs have been submitted.]

Status of Preparations

Random number seeds procedure

Paul spoke to David about this. It seems that mcsmear is currently
generating its own random number seed. We still have details to fill in
on this story.

Running Jobs at FSU

FSU has started running data challenge test jobs on their cluster.
Aristeidis has started with 50 jobs, but an early look shows problems
with some of them in hd_root. Also there was the GlueX-software-induced
crash of the FSU cluster[?].

Running jobs at CMU

Paul is seeing ZFATAL errors from hdgeant. He will send a bug report to
Richard who will look into a fix beyond merely increasing ZEBRA memory.

Richard asked about an issue where JANA takes a long time to identify a
CODA file as not an HDDM file. Richard would like to fix the HDDM
parser such that this is not the case. Mark D. will send Richard an
example.

Running Jobs at NU

Sean regaled us with tales of site specific problems.

Lots of jobs crashed at REST generation. Site configuration changes
helped. But there were still a lot of jobs hanging, usually with new
nodes. Reducing the number of submit slots fixed most of the problems.
Many of the remaining symptoms were jobs hung on the first event when
accessing the magnetic field. Jobs are single-threaded. [23]Some
statistics on the results were presented as well.

Richard remarked that on the OSG, jobs will start much faster if
declared as single-threaded.

Richard proposed the following standards:

BGRATE 1.1 (equivalent to 10^7) BGGATE -800 800 (in ns, time gate for
EM background addition)

We agreed on these as standard settings.

Mark proposed the following split of running:

15% with no EM background 70% with EM background corresponding to 10^7
15% with EM background corresponding to 5\×10^7

There was general agreement; adjustment may happen in the future.

Running Jobs at MIT

Justin has been running with the dc-2.2 tag. The OpenStack cluster at
MIT has about 180 cores and he has been running jobs for a couple of
days with good success. BGGATE was set at -200 to 200.

Electromagnetic Background

Kei gave us an [24]update on his studies of EM background with hdds-2.0
and sim-recon-dc-2.1. Slides covered:
* Memory Usage
* CPU time
* mcsmear File Sizes
* REST File Sizes
* Another Bad File
* Sum of parentid=0
* Correlation of CDC hits
* Correlation of FDC hits
* pπ^+π^- Events

Proposed Schedule

The schedule has slipped. The new schedule is as follows:
1. Launch of Data Challenge Thursday March 6, 2014 (est.).
2. Test jobs going successfully by Tuesday March 4.
3. Distribution ready by Monday March 3.

Justin pointed out that the short REST file problem might be something
that we could live with for this data challenge.

Richard asked that Mark assign run numbers and run conditions for the
various sites.

Action Items

1. Understand random number seed system.
2. Solve ZFATAL crashes.
3. Make a table of conditions vs. sites where the entries are assigned
file numbers.
4. Report the broken Polycom in L207.

Retrieved from
"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_February_28,_2014"

References

21. 
https://mailman.jlab.org/pipermail/halld-offline/2014-February/001511.html
22. 
https://mailman.jlab.org/pipermail/halld-offline/2014-February/001512.html
23. 
https://halldweb1.jlab.org/wiki/images/f/f8/DC2-Meeting-sdobbs-20140228.pdf
24. https://halldweb1.jlab.org/wiki/images/e/ea/2014-02-28-DC2.pdf




More information about the Halld-offline mailing list