<html>
<head>
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hello,<br>
<br>
From the minutes of the last meeting:<br>
<br>
<blockquote type="cite">
<pre wrap="">Action Items
1. Make a list of milestones.
2. Do a micro-data challenge with jproj.pl. -> Mark
3. Check in the final REST format. -> Richard</pre>
</blockquote>
The meeting was cut off before I could tell the whole story. It
was completed, tested, and checked in before Monday's meeting.
But with caveats.<br>
<ol>
<li>final event size is 3.5kB (2.4kB on disk because of bzip2
compression)</li>
<li>bzip2 decomp overhead is negligible, but
reference_trajectory reconstruction costs ~ 2ms (3.5 GHz i7)
per track, 15 tracks average per 5pi,p event at 9 GeV.</li>
<li>when storing DBCALShower objects, there is the ambiguity of
whether to use KLOE or default bcal clustering. Right now I
query the DNeutralShower object for its USE_KLOE parameter,
which is switchable using a command-line parameter. In case
you are not familiar with the difference, there is just one
DBCALShower object in dana, but jana objects carry "tags"
which tell something about where they came from. Right now
there is a factory that produces DBCALShower objects with the
tag "KLOE", and another one that produces THE SAME objects
using a different algorithm, with the default tag "".<br>
</li>
<li>when reading back DBCALShower objects, I will unpack them
(agnostic as to how they were made) either as KLOE or default
bcal clusters, depending on what the user asks for. This has
the "feature" that the person who wrote the REST file might
write KLOE clusters and the person who reads it might fetch
them as default clusters. This might be the correct
behavior. If not, there is another way to go: create separate
lists in the REST format for each type of each object that one
foresees, and have the writer only populate one of them. The
overhead for doing this is negligible, but I think it is
ugly. IMO, there is just one conceptual object known as a
DBCALShower, but many ways to make it. Within the KLOE or
default factories there are many internal settings that one
might switch around. Are we going to try to save all of that
information in the output file on an event-by-event basis?
Why not put it in the database Mark was talking about, where a
single entry can cover many events, event multiple files
within a production run.<br>
</li>
</ol>
<br>
-Richard J.<br>
<br>
<br>
<br>
On 7/17/2012 5:03 PM, Mark M. Ito wrote:<br>
</div>
<blockquote cite="mid:5005D336.70206@jlab.org" type="cite">
<pre wrap="">Folks,
Find the meeting minutes below and at
<a href="https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012#Minutes">https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012#Minutes</a>
-- Mark
_____
GlueX Data Challenge Meeting, July 16, 2012
Minutes
Present:
* CMU: Paul Mattione, Curtis Meyer
* IU: Matt Shepherd
* JLab: Eugene Chudakov, Mark Ito, David Lawrence
* UConn: Richard Jones
Announcements
Mark reported that the Computer Center was given a heads-up that we
will start working on the data challenge. Batch jobs will start to
appear on the JLab farm. We will have a tape volume set that can be
recycled.
Scope of this Challenge
* We agreed to continue on the course of using Curtis's document as
a repository of our ideas about the data challenge (DC). Curtis
agreed to convert document 2031 into a Wiki page. The page is now
available as Planning for The Next GlueX Data Challenge.
* Curtis reminded us that we need a robust storage resource manager
(SRM) for the DC.
* Grid and JLab: Mark asked about whether we want to pursue
large-scale production on the Grid, at JLab, or both. We decided
to pursue both.
* Matt thought that a huge sample of Pythia data would be sufficient
to address main goals of the DC. Physics signals are of a smaller
scale and can be generated in a more ad hoc mannger.
* We talked about December or January as a tentative time frame for
doing the first DC. In the future we will have to set-up more
formal milestones.
Mini-Data Challenges
It looks like the work management system packages that were
recommended for us may not be appropriate for tracking jobs at JLab.
They are oriented toward a grid-based environment.
Mark described a perl script, jproj.pl, he wrote to manage jobs on the
JLab farm for the PrimEx experiment. It uses a MySQL database to keep
track of jobs and output files and handles multiple job submission. It
is driven off a list of input files to be processed. Multiple,
simultaneous projects are supported. Some assumptions about filenames,
and processing conventions are made to simplify the script. He will
use a modified version to get started processing multiple jobs at
JLab.
Richard reminded us that his gridmake system offers similar
functionality. Mark agreed to look at it as a possible, more
sophisticated replacement for jproj.pl.
At the last offline meeting Mark described the idea of doing multiple,
scheduled, medium-scale data challenges as a development environment
for the tools do a large-scale DC. The idea is to expand scope as we
go from mini-DC to mini-DC, testing ideas as we go. There was a
consensus around following this approach, at least initially.
Analysis System Design Plan
Paul presented some classes for automating the selection of particle
combinations and doing kinematic fits on those combinations by
specifying the reaction to be studied when construction the class. See
his wiki page for details. He has other related ideas which he will be
developing in the near future.
Matt proposed that we start doing analysis with simple, individually
developed scripts and incrementally develop system(s) based on that
experience.
Finalization of REST Format: record size and performance numbers
Richard discussed the changes he made to REST format based on comments
from the last offline meeting and subsequent conversations with Paul.
He made the switch from using DChargedTrackHypothesis to
DTrackTimeBased. He sees a performance hit when the change is made due
to the need to swim a trajectory. The reconstitution rate is 30 Hz for
the events he studied. This compares with a reconstruction rate (from
raw hits) of 1.8 Hz. We agreed that the flexibility in analysis with
new scheme was worth the extra processing time.
Action Items
1. Make a list of milestones.
2. Do a micro-data challenge with jproj.pl. -> Mark
3. Check in the final REST format. -> Richard
Retrieved from
<a href="https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012">"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012"</a>
_______________________________________________
Halld-offline mailing list
<a href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
</pre>
</blockquote>
<br>
</body>
</html>