[Halla12_software] SoLID Software Meeting on Thursday at 2pm 03/26 Room-B101

Tue Mar 31 16:01:26 EDT 2015

Hi everyone

I like that fact that the scope of the discussion has shifted to the
over-arching issues.  I agree with both Zhiwen that flexibility is critical
and with Richard that there is such a thing as "enough rope to hang
yourself".  We'll need judgement to decide the right number of rules since
too many and too few are both bad.

Although my favorite means of enforcing rules is via the compiler (...you
WILL implement ALL the purely virtual methods...), there is typically not
enough oomph in that area to make all our designs self-fulfilling.

Another way to approach this same discussion is to list the things that can
(will) go wrong if we don't address them at the architecture level.  Here
is a short list from me:

1)  The simulation geometry is different from the real geometry.
2)  The offline code applies corrections that fix real data but ruin
simulated data.
3)  I just got a whole set of simulated events from someone's directory,
but they can't tell me the details on the conditions used for the
simulations so I cannot be sure if I can use them.
4)  The detector has evolved to use new systems and I need different codes
to look at simulated (and real) events from these two periods.
5)  I've lost track of how the hits relate to the simulation.
6)  Sometimes the hit list for a reconstructed track includes a few noise
hits that were pretty much aligned with the simulated track.  Do I consider
this track as reconstructed or not?
7)  The simulation is too good.  How should I smear out the simulated data
to match the true detector performance.  Is there a standard architecture
for accomplishing this?
8)  My friend wrote some code to apply fine corrections/calibrations to the
data.  I don't know whether I should or should not do this to simulated
data.
9)  Now that we're a couple of years in, I realize that to analyze my old
dataset, I need some form of a hybrid of code:  progression in some places,
but old standard stuff in others.  What do I do?
10)  Are we (as a collaboration) convinced that all the recent changes are
correct?  Do we have a system for tracking performance instead of just
lines of code change (e.g. benchmark code that double-checks impact of
offline changes to simulations via nightly generating performance metric
runs).

To my opinion, this long list actually requires only a few design paradigms
(the fewer the better) to address it and avoid these issues by design.  One
example of a design paradigm is that the simulated output is required
(administrative rule) to be self describing.  We then define what it means
to self-describe in terms of the information content (e.g. events,
background, geometry, calibration assumptions, ...) and designate a (1)
SIMPLE, (2) Universal, (3) Extensible format by which we will incorporate
self-description into our files.  Another example of a design paradigm is
portability.  We'll have to define portability (even to your laptop?) and
then architect a solution.  PHENIX did poorly in this by not distinguishing
between "core routines" (that open ANY file and then understand/unpack its
standard content) and "all other code" (daq, online monitoring, simulation,
reconstruction, analysis).

Here is an example set of few rules that can accomplish a lot:
(1)  Although there are many formats for input information and output
information, only one "in memory" format can be allowed for any single
piece of information.  Input translates to this and output translates from
this.
(2)  All "in memory" formats for geometry, calibration, and the like will
be built with streamers so that these have the *option* of being included
in our output files.  The intention is that an output file containing the
streamed geometry and calibration would be completely self describing and
thereby require NO reference to external resources (static files,
databases, etc...) when used by others.
(3)  The "in memory" formats will explicitly be coded to allow for schema
evolution so that old files are compatible with new code.

Tools like GEMC are, to my opinion, small internal engines that fit within
our over-arching framework, as indicated by Rakitha.  To me, these are all
basically the same and I have literally no preference.  The key is how we
fit such tools into an overall picture of the COMPLETE computing task that
will lie before us.

Cheers!
Tom

On Tue, Mar 31, 2015 at 2:30 PM, Richard S. Holmes <rsholmes at syr.edu> wrote:

>
> On Tue, Mar 31, 2015 at 2:04 PM, Zhiwen Zhao <zwzhao at jlab.org> wrote:
>
>> For the output tree format, I think we definitely need a tree or a bank
>> contain the event info.
>> But the critical thing is that each sub-detector should be able to define
>> it easily and freely.
>> They can have very different truth info and digitization and the
>> requirement can be different at
>> different stage of study.
>>
>
> Certainly. But it would be better to define a good, general format for
> detector information which different detector groups can use as a baseline
> and, if needed, modify to suit their needs than to have them re-invent the
> wheel for each detector in inconsistent ways. (At the moment, for instance,
> different banks use different names to refer to the same quantity.)
>
> Event, primary, and trajectory data can, I think, be structured the same
> for hits in all detectors.
>
> This reminds me: I'd like to see some degree of standardization of volume
> IDs. Right now it's anarchy.
>
> --
> - Richard S. Holmes
>   Physics Department
>   Syracuse University
>   Syracuse, NY 13244
>   315-443-5977
>
> _______________________________________________
> Halla12_software mailing list
> Halla12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halla12_software
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halla12_software/attachments/20150331/1f4d0e3a/attachment.html