[Halld-offline] Offline Software Meeting Minutes, April 16, 2014
Mark Ito
marki at jlab.org
Fri Apr 18 10:18:38 EDT 2014
Folks,
Please find the minutes below and at
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_April_16,_2014#Minutes
-- Mark
_____________________________________________
GlueX Offline Meeting, April 16, 2014
Minutes
Present:
* CMU: Paul Mattione
* FSU: Volker Crede, Aristeidis Tsaris
* IU: Kei Moriya
* JLab: Mark Ito (chair), David Lawrence, Curtis Meyer, Dmitry
Romanov, Simon Taylor
* MIT: Justin Stevens
* NU: Sean Dobbs
* UConn: Richard Jones [and others?]
Review of Minutes from the Last Meeting
We went over the [34]minutes of the April 2 meeting.
Kei commented on continued work on his study of REST file sizes and
reproducibility. He has sent problem files to Paul and Simon and asked
for feedback. He also indicated that this project may go to back burner
for now, given the size of differences seen thus far.
Data Challenge 2
Data Challenge Meeting Report, April 11
Curtis re-capped the meeting.
Things were running well, with a very low failure rate. The OSG was
just starting up; there had been problems with a site in Brazil that
was accepting jobs which would bomb right away. Most of the sites are
finished or winding down (with the exception of the OSG).
Event Tally Board
We took a look at the [35]board. We are now at about 5 billion events.
Note that this is already as many events as we had for data challenge
1, and these events are a factor of several more expensive than those
in terms of CPU time.
Site Status Updates
MIT. Justin is still running on about 300 cores which include the
FutureGrid cores. At some point soon those will have to go back. He has
done about 40 million events over the past few days.
CMU. Paul has summarized results on [36]his wiki page, in section 5. He
had only 3 failures in 7,000 jobs. He catalogs the reasons for those
failures.
JLab. Mark showed the updated [37]plot of running jobs as a function of
time for the entire data challenge period. He also showed a [38]plot
from Sandy Philpott showing all of jobs on the farm for the past three
months. The steps in job numbers as nodes were switched from LQCD to
the farm are clearly seen. Mark also reviewed [39]his message from
Monday announcing ramp down of the data challenge at JLab and the
return of nodes from the farm to LQCD.
OSG. We looked at [40]recent job history on the Grid. Richard reported
that he got a big batch of jobs through and that some of his grid
proxies were getting stale and so he paused recently and is starting
back up. The Purdue site has withdrawn its nodes indefinitely due to
events in the aftermath of the Heartbleed bug. The GlueX sites (UConn
and NU) have been operating at 98 to 99% efficiency
("productive"-CPU-time as fraction of wall-time). This is to be
contrasted with 60% seen in DC-1. On some sites there was a problem
seen where glide-ins were advertised to us, but the jobs were rejected
due to renaming of the offered proxy from that used in previous
running. This has been cleared up administratively. Support from the
OSG has been very good. Once jobs start they generally run to
completion. In general much smoother than last time. He reports on 2
failures out of 0.5 mega-jobs. Richard plans to continue to run until
he can get two or three days of steady-state, problem-free running. He
will try to balance our run mix as well.
Kinematic Fitter Update
Paul led us through [41]his email announcing changes to his analysis
library and the kinematic fitter in particular. His email has a
complete description of the changes and interested parties should look
there for details.
Kei asked about the recommended procedure for characterizing thrown
particle topology. Paul told us that all of the thrown information is
in the tree, but some navigation by hand is necessary to totally
understand everything in the decay chain.
Justin asked about reasonable cuts for matching between charged tracks
and clusters. Paul has a five sigma cut by default. This has not been
extensively studied. These studies will have to be done to optimize the
cut and may depend on what one is trying to do.
Justin also asked for clarification of the matching between the
requested DReaction and the thrown information. For this there is no
dependence on reconstructed information.
Data Distribution
Richard gave us guidance on accessing data challenge data.
* the OSG generated results from dc2 are being stored at /Gluex/test
on the UConn SRM
* the location on the Northwestern University SRM is
/mnt/xrootd/gluex/dc2
* for instructions on how to access files over SRM, see the
[42]appropriate section of the howto.
He emphasized the best-practice of never attempting an srmls on a data
directory. Instead one should fetch the .ls-l file in each directory.
He also told us that [43]XRootD is supported as a data transport
convention on the SRM servers.
He also proposed that if Globus Online is the only parallel transport
method supported by JLab, then we should deploy it at non-JLab sites,
in the spirit of good collaboration. He thought that Globus Online
Personal would not be compatible with his node at UConn and so a
license would have to be bought to make it a full-fledged end-point.
The cost appears not to be prohibitive.
We discussed continuing to explore options for transport, principally
Globus Online, the OSG SRM, and raw GridFTP.
Skimming Data Challenge Data
Paul showed us a [44]list of proposed skims that we could perform on
the data challenge events. They basically cover the waterfront. They
are grouped in to three broad categories: non-strange meson channels,
strange meson channels, and hyperon channels. With the analysis library
in place, implementing any of these skims is not a big effort. As he
states on his page, the cuts may have to be studied before going into
production.
Given that we have his technology in hand, the discussion revolved
around what we should do with it. The input is obviously the
reconstructed REST data. The two possible approaches are:
1. Do a traditional skim, writing out only selected events to a
smaller output file.
2. Implement the EventStore. Then multiple skims can be supported from
a single set of files.
We agreed that centralizing this function in the long run would
eliminate duplicate application of manpower and waste of computing
resources. We also noted that some of this processing could conceivably
be done at non-JLab sites since it does not involve shipping around
"raw" data.
We will explore both approaches. Paul will do a few pilot skims as a
demonstration. Sean will look at the EventStore.
Data Challenge Meetings?
We decided to suspend the special data challenge meetings and put
discussion back into the regular bi-weekly offline meeting.
Other Offline Items
David directed our attention to other items that need attention now
that the production part of the data challenge is over.
1. Tagger reconstruction is not in HDGeant. Although tagger hits are
there, based on the thrown photon energy (not a detailed particle
swim) the step to turn it back into a photon energy has not been
done.
2. The online group is discussion moving online code to a separate
subversion repository. Pluses and minuses of such a move have to be
discussed in both online and offline working groups.
3. The version of the GlueX wiki is getting kind of old. We are
running 1.17, from 2011, and the latest is 1.22. David will look
into a version refresh. There is a related issue: changing the
authentication to the standard JLab LDAP scheme, which seems like a
good idea. That move may or may not have an influence on how we
proceed.
References
34.
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_April_2,_2014
35.
https://docs.google.com/spreadsheets/d/1qvF9B-76gr8NdsTKsO17jqL0qc5OXqK46JluvXnJ98k/edit?usp=sharing
36. https://halldweb1.jlab.org/wiki/index.php/CMU_Data_Challenge_2
37. https://halldweb1.jlab.org/wiki/images/d/d5/Jobs_gluex_04-16.png
38. https://halldweb1.jlab.org/wiki/images/5/5b/Farm_2014.png
39.
https://mailman.jlab.org/pipermail/halld-offline/2014-April/001652.html
40. https://halldweb1.jlab.org/wiki/images/a/a5/Grid_jobs.png
41.
https://mailman.jlab.org/pipermail/halld-physics/2014-April/000394.html
42.
https://halldweb1.jlab.org/wiki/index.php/Using_the_Grid#Accessing_stored_data_over_SRM
43. http://xrootd.org/
44. https://halldweb1.jlab.org/wiki/index.php/Mattione_Update_04212014
--
Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295
More information about the Halld-offline
mailing list