[Halld-offline] Data Challenge Meeting, March 28, 2014

Sun Mar 30 17:20:10 EDT 2014

Colleagues,

Please find the minutes below and at

https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_March_28,_2014#Minutes

-- Mark
___________________________________________________

GlueX Data Challenge Meeting, March 28, 2014
Minutes

Present:
* CMU: Paul Mattione, Curtis Meyer
* FSU: Volker Crede, Aristeidis Tsaris
* JLab: Mark Ito (chair), Sandy Philpott, Simon Taylor
* IU: Kei Moriya
* MIT: Justin Stevens
* NU: Sean Dobbs, Kam Seth

Announcements

Mark re-capped the story with the optional versions of [35]JANA and
[36]HDDS that were discussed on the email list early in the week. They
are listed on the [37]conditions page.

Status reports from sites

JLab

Mark gave the [38]report. About 1000 cores were added on Monday. Our
share now is roughly 1400 cores.

Sandy asked if we could use more cores and the answer was yes. She also
mentioned that the power outage in the Data Center has been pushed back
to late April and thus will not interfere with the challenge.

Mark discussed [39]recent progress with job tracking at JLab. Chris
Hewitt showed him how to access the Auger database via a web service
and now that information can be downloaded to a GlueX-accessible
database. Mark showed some example tables. There are a lot of
diagnostic statistics that can be derived from this information.

OSG

Sean reported that many of the OSG sites have a much leaner mix of
installed packages than anticipated and that that has slowed progress.
The CERNVM disk export is working and ready to go.

MIT

Justin gave the [40]report.

The FutureGrid component is up to 180 cores now, and more may be
coming.

CMU

Paul gave the [41]report. 384 cores have been reserved for three weeks.

FSU

Aristeidis gave the [FSU Data Challenge 2|report]. FSU is running on
144 cores.

Totals

Curtis did a quick tally:
run EM bkgd. (γ/s in coh. peak) events (millions)
9001 1e7 350
9002 5e7 100
9003 0 250

Discussion on moving forward

Curtis lead the discussion on several topics.

Data Storage Location(s)

Curtis remarked that now that we are producing this data, we need to
decide where to store it so that it can be used effectively. Sean
reminded us that our current plan to to store them at UConn and
Northwestern with access via the OSG SRM. This plan is acceptable to
all collaborating institutions. Unfortunately, JLab cannot participate
in this activity, for both contributing to the data set and for access
to it, given our current knowledge base which no doubt needs expansion.

Reduction in the number of files?

We discussed whether we should try to merge job output to reduce the
file-management load. One practical issue is that we do not have unique
event numbers for a given run. Unique file-number/event-number
combinations, yes. There was no consensus on the need to combine. We
may take it up at a later date.

Skimming files for physics?

Curtis proposed a skim that would be useful for a lot of people. Paul
noted the difficulty in balancing general usefulness versus worthwhile
reduction in data volume. Justin noted that the speed-up due to not
re-swimming tracks from the REST output changes the calculation. Kei
noted that we have not settled on what might be most useful. Paul told
us that the functionality in the code to do the skims is already there.
Mark remarked that we might want to wait until a demand arises before
making a decision. Paul mentioned that the EventStore solution might
also be driven by such a demand. Again, we need to discuss this
further.

Future Work, Near-Term and Far

* Weekly data challenges.
* Running a data challenge from the mass storage system at JLab.
* Reduction of processing devoted to electromagnetic background
generation.
* Monitoring quality of the current data challenge.
* File transfers in and out of JLab.

Schedule

Justin asked about the time-scale for this data challenge. Another week
is clearly necessary. Curtis pointed out that until the OSG processing
rate is known, it is hard to make a call on this. Recall that last time
(i. e., for DC1) the OSG contributed about 80% of the total processing.

Retrieved from
"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_March_28,_2014"

References

35. https://mailman.jlab.org/pipermail/halld-offline/2014-March/001629.html
36. https://mailman.jlab.org/pipermail/halld-offline/2014-March/001633.html
37. 
https://halldweb1.jlab.org/data_challenge/02/conditions/data_challenge_2.html
38. 
https://halldweb1.jlab.org/wiki/index.php/JLab_Data_Challenge_Status,_March_28,_2014
39. https://halldweb1.jlab.org/wiki/index.php/Farm_Job_Tracking_Database
40. 
https://halldweb1.jlab.org/wiki/index.php/MIT/FutureGrid_Data_Challenge_2_Production#Update_3.2F28.2F14
41. https://halldweb1.jlab.org/wiki/index.php/CMU_Data_Challenge_2

-- 
Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295