<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi everyone,<br>

    <br>

    I appreciate everyone taking the initiative to lay out these ideas. 

    My plan for Thursday is to go through these carefully and try to

    organize them into a list of issues and preferences with what I

    started to present last week.<br>

    <br>

    Best,<br>

    Seamus<br>

    <br>

    <div class="moz-cite-prefix">On 03/31/2015 04:01 PM, Thomas K

      Hemmick wrote:<br>

    </div>

    <blockquote

cite="mid:CAAVEoEdZWH7JzYi5+ZCpVQA9q_Ab9mV+a5wcefxAdqji-vLwYg@mail.gmail.com"

      type="cite">

      <div dir="ltr">Hi everyone

        <div><br>

        </div>

        <div>I like that fact that the scope of the discussion has

          shifted to the over-arching issues.  I agree with both Zhiwen

          that flexibility is critical and with Richard that there is

          such a thing as "enough rope to hang yourself".  We'll need

          judgement to decide the right number of rules since too many

          and too few are both bad.</div>

        <div><br>

        </div>

        <div>Although my favorite means of enforcing rules is via the

          compiler (...you WILL implement ALL the purely virtual

          methods...), there is typically not enough oomph in that area

          to make all our designs self-fulfilling.</div>

        <div><br>

        </div>

        <div>Another way to approach this same discussion is to list the

          things that can (will) go wrong if we don't address them at

          the architecture level.  Here is a short list from me:</div>

        <div><br>

        </div>

        <div>1)  The simulation geometry is different from the real

          geometry.</div>

        <div>2)  The offline code applies corrections that fix real data

          but ruin simulated data.</div>

        <div>3)  I just got a whole set of simulated events from

          someone's directory, but they can't tell me the details on the

          conditions used for the simulations so I cannot be sure if I

          can use them.</div>

        <div>4)  The detector has evolved to use new systems and I need

          different codes to look at simulated (and real) events from

          these two periods.</div>

        <div>5)  I've lost track of how the hits relate to the

          simulation.</div>

        <div>6)  Sometimes the hit list for a reconstructed track

          includes a few noise hits that were pretty much aligned with

          the simulated track.  Do I consider this track as

          reconstructed or not?</div>

        <div>7)  The simulation is too good.  How should I smear out the

          simulated data to match the true detector performance.  Is

          there a standard architecture for accomplishing this?</div>

        <div>8)  My friend wrote some code to apply fine

          corrections/calibrations to the data.  I don't know whether I

          should or should not do this to simulated data.  </div>

        <div>9)  Now that we're a couple of years in, I realize that to

          analyze my old dataset, I need some form of a hybrid of code:

           progression in some places, but old standard stuff in

          others.  What do I do?</div>

        <div>10)  Are we (as a collaboration) convinced that all the

          recent changes are correct?  Do we have a system for tracking

          performance instead of just lines of code change (e.g.

          benchmark code that double-checks impact of offline changes to

          simulations via nightly generating performance metric runs).</div>

        <div><br>

        </div>

        <div>To my opinion, this long list actually requires only a few

          design paradigms (the fewer the better) to address it and

          avoid these issues by design.  One example of a design

          paradigm is that the simulated output is required

          (administrative rule) to be self describing.  We then define

          what it means to self-describe in terms of the information

          content (e.g. events, background, geometry, calibration

          assumptions, ...) and designate a (1) SIMPLE, (2) Universal,

          (3) Extensible format by which we will incorporate

          self-description into our files.  Another example of a design

          paradigm is portability.  We'll have to define portability

          (even to your laptop?) and then architect a solution.  PHENIX

          did poorly in this by not distinguishing between "core

          routines" (that open ANY file and then understand/unpack its

          standard content) and "all other code" (daq, online

          monitoring, simulation, reconstruction, analysis).  </div>

        <div><br>

        </div>

        <div>Here is an example set of few rules that can accomplish a

          lot:</div>

        <div>(1)  Although there are many formats for input information

          and output information, only one "in memory" format can be

          allowed for any single piece of information.  Input translates

          to this and output translates from this.</div>

        <div>(2)  All "in memory" formats for geometry, calibration, and

          the like will be built with streamers so that these have the

          *option* of being included in our output files.  The intention

          is that an output file containing the streamed geometry and

          calibration would be completely self describing and thereby

          require NO reference to external resources (static files,

          databases, etc...) when used by others.</div>

        <div>(3)  The "in memory" formats will explicitly be coded to

          allow for schema evolution so that old files are compatible

          with new code.</div>

        <div><br>

        </div>

        <div>Tools like GEMC are, to my opinion, small internal engines

          that fit within our over-arching framework, as indicated by

          Rakitha.  To me, these are all basically the same and I have

          literally no preference.  The key is how we fit such tools

          into an overall picture of the COMPLETE computing task that

          will lie before us.</div>

        <div><br>

        </div>

        <div>Cheers!</div>

        <div>Tom</div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Mar 31, 2015 at 2:30 PM,

          Richard S. Holmes <span dir="ltr">&lt;<a

              moz-do-not-send="true" href="mailto:rsholmes@syr.edu"

              target="_blank">rsholmes@syr.edu</a>&gt;</span> wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div class="gmail_extra"><span class=""><br>

                  <div class="gmail_quote">On Tue, Mar 31, 2015 at 2:04

                    PM, Zhiwen Zhao <span dir="ltr">&lt;<a

                        moz-do-not-send="true"

                        href="mailto:zwzhao@jlab.org" target="_blank">zwzhao@jlab.org</a>&gt;</span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div style="overflow:hidden">For the output tree

                        format, I think we definitely need a tree or a

                        bank contain the event info.<br>

                        But the critical thing is that each sub-detector

                        should be able to define it easily and freely.<br>

                        They can have very different truth info and

                        digitization and the requirement can be

                        different at<br>

                        different stage of study.</div>

                    </blockquote>

                  </div>

                  <br>

                </span>Certainly. But it would be better to define a

                good, general format for detector information which

                different detector groups can use as a baseline and, if

                needed, modify to suit their needs than to have them

                re-invent the wheel for each detector in inconsistent

                ways. (At the moment, for instance, different banks use

                different names to refer to the same quantity.)</div>

              <div class="gmail_extra"><br>

              </div>

              <div class="gmail_extra">Event, primary, and trajectory

                data can, I think, be structured the same for hits in

                all detectors.</div>

              <div class="gmail_extra"><br>

              </div>

              <div class="gmail_extra">This reminds me: I'd like to see

                some degree of standardization of volume IDs. Right now

                it's anarchy.<span class=""><br>

                  <div><br>

                  </div>

                  -- <br>

                  <div>- Richard S. Holmes<br>

                      Physics Department<br>

                      Syracuse University<br>

                      Syracuse, NY 13244<br>

                      <a moz-do-not-send="true" href="tel:315-443-5977"

                      value="+13154435977" target="_blank">315-443-5977</a><br>

                  </div>

                </span></div>

            </div>

            <br>

            _______________________________________________<br>

            Halla12_software mailing list<br>

            <a moz-do-not-send="true"

              href="mailto:Halla12_software@jlab.org">Halla12_software@jlab.org</a><br>

            <a moz-do-not-send="true"

              href="https://mailman.jlab.org/mailman/listinfo/halla12_software"

              target="_blank">https://mailman.jlab.org/mailman/listinfo/halla12_software</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Halla12_software mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Halla12_software@jlab.org">Halla12_software@jlab.org</a>

<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halla12_software">https://mailman.jlab.org/mailman/listinfo/halla12_software</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Seamus Riordan, Ph.D.                   <a class="moz-txt-link-abbreviated" href="mailto:seamus.riordan@stonybrook.edu">seamus.riordan@stonybrook.edu</a>

Stony Brook University                  Office: (631) 632-4069

Research Assistant Professor            Fax:    (631) 632-8573

A-103 Department of Physics

6999 SUNY

Stony Brook NY 11974-3800

</pre>

  </body>

</html>