<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    Hi Richard,<br>

    <br>

      Thanks for this nice summary. I have a few comments:<br>

    <br>

    1. It should be noted that the DNeutralShower_factory class

    currently implements a duplicate mechanism<br>

    for allowing the user to specify the KLOE algorithm be used as

    opposed to the default. The code as it is,<br>

    implements a configuration parameter BCALRECON:USE_KLOE which it

    defaults to "1". Since it is a <br>

    JANA Configuration Parameter, the user can change it via the command

    line using:<br>

    <br>

    -PBCALRECON:USE_KLOE=0<br>

    <br>

    However, JANA has a built in mechanism to do exactly this sort of

    thing via the "DEFTAG:*" configuration<br>

    parameter. Specifically, the user could specify the following:<br>

    <br>

    -PDEFTAG:DBCALShower=KLOE<br>

    <br>

    Then *all* calls like:<br>

    <br>

     loop->Get(locBCALShowers)<br>

    <br>

    will use the KLOE-algorithm-built objects. This is important since

    if another factory or event processor <br>

    wants to use the DBCALShower objects, they don't have to implement

    checking of a configuration<br>

    parameter themselves in order to get the same objects used by every

    other place else in the code.<br>

    <br>

    <br>

    2. Every data object inherits from JObject which provides a GetTag()

    method. Calling this for just the<br>

    first object returned in a call to loop->Get(...) will tell you

    the factory used to make all of the<br>

    objects in the list. (There is one use situation where this wouldn't

    be strictly true and you'd need to<br>

    query every object for its maker's tag, but that would be highly

    unusual.)  In other words, you shouldn't<br>

    ever have to query a factory object directly to figure out how an

    object was made.<br>

    <br>

    <br>

    3. I have always envisioned that whenever a processed data file was

    written out, a complete set of<br>

    configuration parameters would be written out with it. That is

    required to reproduce the results<br>

    of a given job. In fact, there is a command<br>

    line option available in all JANA programs: --dumpconfig that

    creates a text file with all configuration<br>

    parameters and their settings at the end of processing. The file is

    of a format that can be read<br>

    in using --config=filename. This file could be put in the header (or

    even better, in the first event).<br>

    It would include all of the DEFTAG:* settings.<br>

    <br>

    <br>

    I agree with Mark too that having this info in both the file AND a

    database is good. One is often<br>

    much more convenient than the other depending on what you're doing.<br>

    <br>

    Regards,<br>

    -David<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 7/18/12 8:59 AM, Richard Jones

      wrote:<br>

    </div>

    <blockquote cite="mid:5006B325.1030602@uconn.edu" type="cite">

      <div class="moz-cite-prefix">Hello,<br>

        <br>

        From the minutes of the last meeting:<br>

        <br>

        <blockquote type="cite">

          <pre wrap="">Action Items

     1. Make a list of milestones.

     2. Do a micro-data challenge with jproj.pl. -> Mark

     3. Check in the final REST format. -> Richard</pre>

        </blockquote>

        The meeting was cut off before I could tell the whole story.  It

        was completed, tested, and checked in before Monday's meeting. 

        But with caveats.<br>

        <ol>

          <li>final event size is 3.5kB (2.4kB on disk because of bzip2

            compression)</li>

          <li>bzip2 decomp overhead is negligible, but

            reference_trajectory reconstruction costs ~ 2ms (3.5 GHz i7)

            per track, 15 tracks average per 5pi,p event at 9 GeV.</li>

          <li>when storing DBCALShower objects, there is the ambiguity

            of whether to use KLOE or default bcal clustering.  Right

            now I query the DNeutralShower object for its USE_KLOE

            parameter, which is switchable using a command-line

            parameter.  In case you are not familiar with the

            difference, there is just one DBCALShower object in dana,

            but jana objects carry "tags" which tell something about

            where they came from.  Right now there is a factory that

            produces DBCALShower objects with the tag "KLOE", and

            another one that produces THE SAME objects using a different

            algorithm, with the default tag "".<br>

          </li>

          <li>when reading back DBCALShower objects, I will unpack them

            (agnostic as to how they were made) either as KLOE or

            default bcal clusters, depending on what the user asks for. 

            This has the "feature" that the person who wrote the REST

            file might write KLOE clusters and the person who reads it

            might fetch them as default clusters.  This might be the

            correct behavior.  If not, there is another way to go:

            create separate lists in the REST format for each type of

            each object that one foresees, and have the writer only

            populate one of them.  The overhead for doing this is

            negligible, but I think it is ugly.  IMO, there is just one

            conceptual object known as a DBCALShower, but many ways to

            make it.  Within the KLOE or default factories there are

            many internal settings that one might switch around.  Are we

            going to try to save all of that information in the output

            file on an event-by-event basis?  Why not put it in the

            database Mark was talking about, where a single entry can

            cover many events, event multiple files within a production

            run.<br>

          </li>

        </ol>

        <br>

        -Richard J.<br>

        <br>

        <br>

        <br>

        On 7/17/2012 5:03 PM, Mark M. Ito wrote:<br>

      </div>

      <blockquote cite="mid:5005D336.70206@jlab.org" type="cite">

        <pre wrap="">Folks,

Find the meeting minutes below and at

<a moz-do-not-send="true" href="https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012#Minutes">https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012#Minutes</a>

   -- Mark

_____

GlueX Data Challenge Meeting, July 16, 2012

Minutes

    Present:

      * CMU: Paul Mattione, Curtis Meyer

      * IU: Matt Shepherd

      * JLab: Eugene Chudakov, Mark Ito, David Lawrence

      * UConn: Richard Jones

Announcements

    Mark  reported  that  the Computer Center was given a heads-up that we

    will  start  working  on  the data challenge. Batch jobs will start to

    appear  on  the  JLab farm. We will have a tape volume set that can be

    recycled.

Scope of this Challenge

      * We  agreed to continue on the course of using Curtis's document as

        a  repository  of  our ideas about the data challenge (DC). Curtis

        agreed  to convert document 2031 into a Wiki page. The page is now

        available as Planning for The Next GlueX Data Challenge.

      * Curtis  reminded us that we need a robust storage resource manager

        (SRM) for the DC.

      * Grid  and  JLab:  Mark  asked  about  whether  we  want  to pursue

        large-scale  production  on the Grid, at JLab, or both. We decided

        to pursue both.

      * Matt thought that a huge sample of Pythia data would be sufficient

        to  address main goals of the DC. Physics signals are of a smaller

        scale and can be generated in a more ad hoc mannger.

      * We  talked about December or January as a tentative time frame for

        doing  the  first  DC.  In  the future we will have to set-up more

        formal milestones.

Mini-Data Challenges

    It   looks   like  the  work  management  system  packages  that  were

    recommended  for  us may not be appropriate for tracking jobs at JLab.

    They are oriented toward a grid-based environment.

    Mark described a perl script, jproj.pl, he wrote to manage jobs on the

    JLab  farm for the PrimEx experiment. It uses a MySQL database to keep

    track of jobs and output files and handles multiple job submission. It

    is  driven  off  a  list  of  input  files  to be processed. Multiple,

    simultaneous projects are supported. Some assumptions about filenames,

    and  processing  conventions  are made to simplify the script. He will

    use  a  modified  version  to  get started processing multiple jobs at

    JLab.

    Richard   reminded   us   that  his  gridmake  system  offers  similar

    functionality.  Mark  agreed  to  look  at  it  as  a  possible,  more

    sophisticated replacement for jproj.pl.

    At the last offline meeting Mark described the idea of doing multiple,

    scheduled,  medium-scale  data challenges as a development environment

    for  the  tools do a large-scale DC. The idea is to expand scope as we

    go  from  mini-DC  to  mini-DC,  testing  ideas  as we go. There was a

    consensus around following this approach, at least initially.

Analysis System Design Plan

    Paul  presented  some classes for automating the selection of particle

    combinations  and  doing  kinematic  fits  on  those  combinations  by

    specifying the reaction to be studied when construction the class. See

    his wiki page for details. He has other related ideas which he will be

    developing in the near future.

    Matt  proposed  that we start doing analysis with simple, individually

    developed  scripts  and  incrementally develop system(s) based on that

    experience.

Finalization of REST Format: record size and performance numbers

    Richard discussed the changes he made to REST format based on comments

    from  the last offline meeting and subsequent conversations with Paul.

    He   made   the   switch   from   using   DChargedTrackHypothesis   to

    DTrackTimeBased. He sees a performance hit when the change is made due

    to the need to swim a trajectory. The reconstitution rate is 30 Hz for

    the  events he studied. This compares with a reconstruction rate (from

    raw  hits)  of 1.8 Hz. We agreed that the flexibility in analysis with

    new scheme was worth the extra processing time.

Action Items

     1. Make a list of milestones.

     2. Do a micro-data challenge with jproj.pl. -> Mark

     3. Check in the final REST format. -> Richard

    Retrieved from

<a moz-do-not-send="true" href="https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012">"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012"</a>

_______________________________________________

Halld-offline mailing list

<a moz-do-not-send="true" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>

<a moz-do-not-send="true" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>

</pre>

      </blockquote>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Halld-offline mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>

<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>

    </blockquote>

    <br>

    <br>

  </body>

</html>