[Halla12_software] SoLID Software Meeting on Thursday at 2pm 03/26 Room-B101

Tue Mar 31 16:10:15 EDT 2015

Hi everyone,

I appreciate everyone taking the initiative to lay out these ideas. My 
plan for Thursday is to go through these carefully and try to organize 
them into a list of issues and preferences with what I started to 
present last week.

Best,
Seamus

On 03/31/2015 04:01 PM, Thomas K Hemmick wrote:
> Hi everyone
>
> I like that fact that the scope of the discussion has shifted to the 
> over-arching issues.  I agree with both Zhiwen that flexibility is 
> critical and with Richard that there is such a thing as "enough rope 
> to hang yourself".  We'll need judgement to decide the right number of 
> rules since too many and too few are both bad.
>
> Although my favorite means of enforcing rules is via the compiler 
> (...you WILL implement ALL the purely virtual methods...), there is 
> typically not enough oomph in that area to make all our designs 
> self-fulfilling.
>
> Another way to approach this same discussion is to list the things 
> that can (will) go wrong if we don't address them at the architecture 
> level.  Here is a short list from me:
>
> 1)  The simulation geometry is different from the real geometry.
> 2)  The offline code applies corrections that fix real data but ruin 
> simulated data.
> 3)  I just got a whole set of simulated events from someone's 
> directory, but they can't tell me the details on the conditions used 
> for the simulations so I cannot be sure if I can use them.
> 4)  The detector has evolved to use new systems and I need different 
> codes to look at simulated (and real) events from these two periods.
> 5)  I've lost track of how the hits relate to the simulation.
> 6)  Sometimes the hit list for a reconstructed track includes a few 
> noise hits that were pretty much aligned with the simulated track.  Do 
> I consider this track as reconstructed or not?
> 7)  The simulation is too good.  How should I smear out the simulated 
> data to match the true detector performance.  Is there a standard 
> architecture for accomplishing this?
> 8)  My friend wrote some code to apply fine corrections/calibrations 
> to the data.  I don't know whether I should or should not do this to 
> simulated data.
> 9)  Now that we're a couple of years in, I realize that to analyze my 
> old dataset, I need some form of a hybrid of code:  progression in 
> some places, but old standard stuff in others.  What do I do?
> 10)  Are we (as a collaboration) convinced that all the recent changes 
> are correct?  Do we have a system for tracking performance instead of 
> just lines of code change (e.g. benchmark code that double-checks 
> impact of offline changes to simulations via nightly generating 
> performance metric runs).
>
> To my opinion, this long list actually requires only a few design 
> paradigms (the fewer the better) to address it and avoid these issues 
> by design.  One example of a design paradigm is that the simulated 
> output is required (administrative rule) to be self describing.  We 
> then define what it means to self-describe in terms of the information 
> content (e.g. events, background, geometry, calibration assumptions, 
> ...) and designate a (1) SIMPLE, (2) Universal, (3) Extensible format 
> by which we will incorporate self-description into our files.  Another 
> example of a design paradigm is portability.  We'll have to define 
> portability (even to your laptop?) and then architect a solution.  
> PHENIX did poorly in this by not distinguishing between "core 
> routines" (that open ANY file and then understand/unpack its standard 
> content) and "all other code" (daq, online monitoring, simulation, 
> reconstruction, analysis).
>
> Here is an example set of few rules that can accomplish a lot:
> (1)  Although there are many formats for input information and output 
> information, only one "in memory" format can be allowed for any single 
> piece of information.  Input translates to this and output translates 
> from this.
> (2)  All "in memory" formats for geometry, calibration, and the like 
> will be built with streamers so that these have the *option* of being 
> included in our output files.  The intention is that an output file 
> containing the streamed geometry and calibration would be completely 
> self describing and thereby require NO reference to external resources 
> (static files, databases, etc...) when used by others.
> (3)  The "in memory" formats will explicitly be coded to allow for 
> schema evolution so that old files are compatible with new code.
>
> Tools like GEMC are, to my opinion, small internal engines that fit 
> within our over-arching framework, as indicated by Rakitha.  To me, 
> these are all basically the same and I have literally no preference.  
> The key is how we fit such tools into an overall picture of the 
> COMPLETE computing task that will lie before us.
>
> Cheers!
> Tom
>
> On Tue, Mar 31, 2015 at 2:30 PM, Richard S. Holmes <rsholmes at syr.edu 
> <mailto:rsholmes at syr.edu>> wrote:
>
>
>     On Tue, Mar 31, 2015 at 2:04 PM, Zhiwen Zhao <zwzhao at jlab.org
>     <mailto:zwzhao at jlab.org>> wrote:
>
>         For the output tree format, I think we definitely need a tree
>         or a bank contain the event info.
>         But the critical thing is that each sub-detector should be
>         able to define it easily and freely.
>         They can have very different truth info and digitization and
>         the requirement can be different at
>         different stage of study.
>
>
>     Certainly. But it would be better to define a good, general format
>     for detector information which different detector groups can use
>     as a baseline and, if needed, modify to suit their needs than to
>     have them re-invent the wheel for each detector in inconsistent
>     ways. (At the moment, for instance, different banks use different
>     names to refer to the same quantity.)
>
>     Event, primary, and trajectory data can, I think, be structured
>     the same for hits in all detectors.
>
>     This reminds me: I'd like to see some degree of standardization of
>     volume IDs. Right now it's anarchy.
>
>     -- 
>     - Richard S. Holmes
>       Physics Department
>       Syracuse University
>       Syracuse, NY 13244
>     315-443-5977 <tel:315-443-5977>
>
>     _______________________________________________
>     Halla12_software mailing list
>     Halla12_software at jlab.org <mailto:Halla12_software at jlab.org>
>     https://mailman.jlab.org/mailman/listinfo/halla12_software
>
>
>
>
> _______________________________________________
> Halla12_software mailing list
> Halla12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halla12_software

-- 
Seamus Riordan, Ph.D.                   seamus.riordan at stonybrook.edu
Stony Brook University                  Office: (631) 632-4069
Research Assistant Professor            Fax:    (631) 632-8573

A-103 Department of Physics
6999 SUNY
Stony Brook NY 11974-3800

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halla12_software/attachments/20150331/0b65729c/attachment-0001.html