[Halld-offline] Data Challenge Meeting Minutes, July 16, 2012

Tue Jul 17 17:03:50 EDT 2012

Folks,

Find the meeting minutes below and at

   
https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012#Minutes

   -- Mark
_____

GlueX Data Challenge Meeting, July 16, 2012
Minutes

    Present:
      * CMU: Paul Mattione, Curtis Meyer
      * IU: Matt Shepherd
      * JLab: Eugene Chudakov, Mark Ito, David Lawrence
      * UConn: Richard Jones

Announcements

    Mark  reported  that  the Computer Center was given a heads-up that we
    will  start  working  on  the data challenge. Batch jobs will start to
    appear  on  the  JLab farm. We will have a tape volume set that can be
    recycled.

Scope of this Challenge

      * We  agreed to continue on the course of using Curtis's document as
        a  repository  of  our ideas about the data challenge (DC). Curtis
        agreed  to convert document 2031 into a Wiki page. The page is now
        available as Planning for The Next GlueX Data Challenge.
      * Curtis  reminded us that we need a robust storage resource manager
        (SRM) for the DC.
      * Grid  and  JLab:  Mark  asked  about  whether  we  want  to pursue
        large-scale  production  on the Grid, at JLab, or both. We decided
        to pursue both.
      * Matt thought that a huge sample of Pythia data would be sufficient
        to  address main goals of the DC. Physics signals are of a smaller
        scale and can be generated in a more ad hoc mannger.
      * We  talked about December or January as a tentative time frame for
        doing  the  first  DC.  In  the future we will have to set-up more
        formal milestones.

Mini-Data Challenges

    It   looks   like  the  work  management  system  packages  that  were
    recommended  for  us may not be appropriate for tracking jobs at JLab.
    They are oriented toward a grid-based environment.

    Mark described a perl script, jproj.pl, he wrote to manage jobs on the
    JLab  farm for the PrimEx experiment. It uses a MySQL database to keep
    track of jobs and output files and handles multiple job submission. It
    is  driven  off  a  list  of  input  files  to be processed. Multiple,
    simultaneous projects are supported. Some assumptions about filenames,
    and  processing  conventions  are made to simplify the script. He will
    use  a  modified  version  to  get started processing multiple jobs at
    JLab.

    Richard   reminded   us   that  his  gridmake  system  offers  similar
    functionality.  Mark  agreed  to  look  at  it  as  a  possible,  more
    sophisticated replacement for jproj.pl.

    At the last offline meeting Mark described the idea of doing multiple,
    scheduled,  medium-scale  data challenges as a development environment
    for  the  tools do a large-scale DC. The idea is to expand scope as we
    go  from  mini-DC  to  mini-DC,  testing  ideas  as we go. There was a
    consensus around following this approach, at least initially.

Analysis System Design Plan

    Paul  presented  some classes for automating the selection of particle
    combinations  and  doing  kinematic  fits  on  those  combinations  by
    specifying the reaction to be studied when construction the class. See
    his wiki page for details. He has other related ideas which he will be
    developing in the near future.

    Matt  proposed  that we start doing analysis with simple, individually
    developed  scripts  and  incrementally develop system(s) based on that
    experience.

Finalization of REST Format: record size and performance numbers

    Richard discussed the changes he made to REST format based on comments
    from  the last offline meeting and subsequent conversations with Paul.
    He   made   the   switch   from   using   DChargedTrackHypothesis   to
    DTrackTimeBased. He sees a performance hit when the change is made due
    to the need to swim a trajectory. The reconstitution rate is 30 Hz for
    the  events he studied. This compares with a reconstruction rate (from
    raw  hits)  of 1.8 Hz. We agreed that the flexibility in analysis with
    new scheme was worth the extra processing time.

Action Items

     1. Make a list of milestones.
     2. Do a micro-data challenge with jproj.pl. -> Mark
     3. Check in the final REST format. -> Richard

    Retrieved from
    
"https://halldweb1.jlab.org/wiki/index.php/GlueX_Data_Challenge_Meeting,_July_16,_2012"