[Halla12_software] another point about geometry API.

Mon Apr 6 16:51:21 EDT 2015

Hi Zhiwen

This is starting to look nice and I think we're beginning to converge
conceptually.  I think the reason we will converge quickly on your diagram
is that it leaves enough wiggle room for each of us to either see our own
vision here or to see our fears (think Rorschach Diagram)!  What I see in
your diagram is this:

1)  We define a simple detector definition.
2)  One main goal of the detector definition is that it be maximally
appealing to offline users (I didn't see this, but I assumed it):
a-  It should be compact.
b-  It should be documented.
c-  It should be universal.
d-  It will likely be implemented as a collection of multiple things
(tracker, rich, magnet, ecal, ...)
3)  The detector definition should have capabilities (I see these as gray
arrows in your diagram):
a-  Be inflatable to full GEANT volumes for simulation.
b-  Be capable to "un-calibrate" simulations for the digitizer.
c-  Be capable to calibrate digital data for analysis.
d-  Deliver reference information to late stage analysis.

======================

I would add one capability (just one) which some people assume is trivial,
but I think is important enough to state out loud and to advertise the
impact.  My personal additional rule would be this:

Rule 0:  Every detector definition object should be equipped with a root
streamer.

Root streamers require only TWO extra lines of code (one in the *.h and one
in the *.C), but add important capabilities.  With root streamers the world
opens in the following way:

I)  Root streamers mean that with ZERO WORK, we can store/retrieve the
detector definition objects to/from a file.
II)  Root streamers mean that with TINY WORK, we can store/retrieve the
detector definition objects to/from a database.

Item "I" is kind of what root wanted...full objects (members AND methods)
go from memory to file and from file to memory with basically no effort.
Item "II" is tiny work, since PHENIX already solved (purely generic!)
sending root streamers to and from a database (wrapped in a
BLOB{BinaryLargeOBject} field).

I believe that item "II" can (after a little bit for copy & paste coding),
save us HUGE effort, and meet many of the criteria that each of us asked
for at various points in this thread.  Let's take the seemingly "at-odds"
requests from Richard/Seamus and from Zhiwen/Ole:

Richard:  I would like simulation output to be portable for off-site usage.
Seamus:  I would like all files to be completely self-describing.
Zhiwen:  We must keep extremely close and careful track of the detector
definitions.
Ole:  We must make efficient and robust use of our resources (e.g.
database).

If you stream the detector definition objects into the simulation output
file, it becomes self-describing and thereby portable  (Richard/Seamus
happy), meticulously accurately summarized (Zhiwen happy), but possibly
huge (Ole unhappy).  If you stream detector definition objects to the
database (Zhiwen happy) and stream meta data into the simulation file (e.g.
a locator to the database record) then you make Ole happy as well.  In both
of these cases you will be keeping meticulous track of the detector
definition (geometry, calibrations, etc...) used during simulation and find
it seemingly impossible to use conflicting assumptions when acting upon the
data later.  Furthermore, even a file with only a meta link could be made
self-describing and transported off site via one extra step:  prepare a
self-describing personal copy of the file at J-Lab using the meta-data to
retrieve the exact detector definition used when the file was made and
stream this along with the events into a now fully self-describing portable
copy that you take off site.

======================

This seems like a reasonably robust design principle for minimizing
mis-matches between geant and offline-reconstruction both at the geometry
and calibration level.  I would love to hear our thoughts on making the
propagation of truth information simple and robust.

All the best,
Tom

On Fri, Apr 3, 2015 at 5:20 PM, Zhiwen Zhao <zwzhao at jlab.org> wrote:

> Dear Tom and Rich
>
> To summarize the ideas I made a diagram on the last slide of
> http://hallaweb.jlab.org/12GeV/SoLID/download/software/
> talk/solid_software_zwzhao_20150402.pdf
> Where I call it "detector definition" instead of core geometry or geometry
> API because it's more than just geometry.
> I also give the name "detector creator" for the code to produce detector
> definition.
>
> The detector creator is the perl scripts used by GEMC and of course it can
> be any other code format.
> Another example, "fairroot" and its relatives "pandaroot","cbmroot","eicroot"
> all use root virtual object to
> store detector definition, then call simulation engine like geant4 or
> geant3 or fluka to do simulation.
> They use some root script as their detector creator.
> The real challenge is to make the format of detector definition not just
> work for simulation, but also for
> digitization, reconstruction and analysis.
> The detector creator can be any tool to produce detector definition as
> long as it's easy to use and can track changes.
>
> What I mean by not hard coding geometry into source is actually not puting
> detector creator into simulation source code.
> That would create endless variation of simulation binary code.
> We'd better make detector creator truly independent.
>
> thanks
>
> Zhiwen
>
>
> On 4/3/2015 1:25 PM, Thomas K Hemmick wrote:
>
>> Hi Richard
>>
>> I agree very much with the point of view you articulated.  What you
>> describe as what I'll call a
>> "core" geometry description is, I think, an example of the concept that
>> Ole referred to as an "API".
>>
>> Let me see if I can try to make this more explicit and ask you guys to
>> tell me if this is consistent
>> with your view.  I think you are indicating a paradigm that begins with
>> defining a "core" geometry
>> representation.  That might be kind of like this:
>>
>> i-  For a given element (detector, magnet, baffle, ...) we define a
>> simple, logical, and convenient
>> "core-geometry" description.
>> ii- We administratively require that the various "projections" of this
>> information into some other
>> basis (you mention simulation, digitization, and reconstruction) all be
>> formulated using the
>> "core-geometry" as their source.
>>
>> I'm inferring from your note that you believe that keeping the "core"
>> definition aligned closely
>> with the offline/reconstruction world is the best way to ensure that the
>> onslaught of users we would
>> get after real data arrives would not immediately break the design
>> construct.  I agree with this
>> very much.  Making things most closely resemble the world of the coders
>> who will never fully
>> learn/appreciate the overall design philosophy is the best defense.
>>
>> I would add that the concept of calibrations has a similar multi-forked
>> utility (digitization and
>> reconstruction) and would be amenable to the very same "core description"
>> paradigm.
>>
>> My interpretation of what Ole was suggesting (design a smart/clean API)
>> is that the collection of
>> these various "core" representations and the requirement that the all
>> codes use core descriptions as
>> their basis allows offline development of alignments, calibrations,
>> corrections, dead channel maps,
>> etc  (components of the "core modules") to propagate immediately from the
>> real data world back to
>> the simulation in an intrinsically self-consistent manner.
>>
>> Sounds like a good start to an API to me...
>> Tom
>>
>> On Thu, Apr 2, 2015 at 11:36 PM, Richard S. Holmes <rsholmes at syr.edu
>> <mailto:rsholmes at syr.edu>> wrote:
>>
>>     The whole hard coded geometry vs. not question strikes me as the
>> wrong question. There really is
>>     not much difference, at least for users, between a geometry defined
>> in a Perl script and a
>>     geometry defined in source code.
>>
>>     Rather, the question should be: How to parameterize the geometry?
>>
>>     Keep in mind that the simulation, the digitization, and the
>> reconstruction all need to know
>>     about geometry, but need to represent it in different forms. The
>> simulation needs to represent
>>     it as GEANT geometry objects. The digitization and reconstruction
>> will probably have their own
>>     ways of representing the apparatus. Whatever approach we take, we
>> have to guarantee that all
>>     these representations are consistent and compatible, and that
>> whenever you change the geometry,
>>     you change it for all parts of the software automatically.
>>
>>     Describing a set of GEMs by describing the 23 (or however many)
>> layers of the first segment of
>>     the first detector, then the 23 layers of the second segment of the
>> first detector, then [dot
>>     dot dot] and then the 23 layers of the 30th segment of the fifth
>> detector is a poor
>>     parameterization. Digitization and reconstruction don't even need to
>> know about all 23 layers,
>>     and simulation doesn't need to know about strip pitches and angles.
>> Instead there should be a
>>     generic description of a GEM sector (or I guess potentially three
>> generic descriptions, one each
>>     for simulation, digitization, and reconstruction) embodied somewhere
>> in the code — in compiled
>>     C++ or interpreted Perl or Python or whatever — and a small set of
>> parameters describing the
>>     number, sizes, positions, and orientations (etc.) of the sectors.
>> Also in the code/scripts would
>>     be methods to obtain the parameter set and build the specific
>> internal representations needed by
>>     putting the generics together according to the recipe specified by
>> the parameter set. If anyone
>>     needs to study, say, a GEM design with a different number of layers,
>> then they'll have to get
>>     into the nitty gritty of modifying the code/scripts that contain the
>> generic descriptions; but
>>     most of the time, if GEMs need to be changed at all, it's changes
>> that can be described just by
>>     modifying that small set of parameters. And for each output file,
>> whether in the file or in an
>>     immutable database table pointed to by the file, the parameter set is
>> what we store for
>>     retrospection (along with, say, the version control tag for the
>> source codes / scripts used with
>>     those parameters to build the geometry).
>>
>>     So it's not a matter of "which should users have to modify, C++ code
>> or Perl scripts?" In all
>>     but rare instances it should be neither, but a small and easily
>> understood parameter set.
>>
>>     Now, given that, it does seem to me that hard coding the generic
>> descriptions in the C++ would
>>     be easier and less error prone than doing it in scripts, because in
>> the latter case you have to
>>     define file formats for the script outputs, you have to implement
>> writing to those formats in
>>     the scripts and reading from them in the C++, and you have to make
>> sure those files are created
>>     and stored in such a way that there's no possibility they'll get out
>> of synch with each other
>>     (if the simulation/digitization/reconstruction representations are
>> stored separately) or with
>>     the original parameter set that's supposed to describe them. Less
>> work for us and less potential
>>     for confusion if the parameter set is read directly by the C++ code
>> and the internal
>>     representations are built by the classes that use them.
>>
>>     But for the users, I don't see it as mattering much which way it's
>> done, as long as they never
>>     (hardly ever) have to look inside either the C++ code or a script, or
>> anything else that
>>     represents the geometry in microscopic detail, to make most changes.
>>
>>
>>
>>     On Thu, Apr 2, 2015 at 4:11 PM, Zhiwen Zhao <zwzhao at jlab.org <mailto:
>> zwzhao at jlab.org>> wrote:
>>
>>         Dear All
>>
>>         Another point about geometry API.
>>         Our simulation needs to support both individual detector R&D and
>> the whole SoLID simulation.
>>         And it needs fast simulation besides full simulation in the same
>> framework.
>>         So having a geometry API, no matter what format it is, instead of
>> hard code many things into
>>         source
>>         code is essential to make it effortless to unify different types
>> of simulation in one single
>>         framework.
>>         Other collaborations have done something similar and this should
>> one thing we try to
>>         understand how
>>         they did it.
>>
>>         Zhiwen
>>         _______________________________________________
>>         Halla12_software mailing list
>>         Halla12_software at jlab.org <mailto:Halla12_software at jlab.org>
>>         https://mailman.jlab.org/mailman/listinfo/halla12_software
>>
>>
>>
>>
>>     --
>>     - Richard S. Holmes
>>        Physics Department
>>        Syracuse University
>>        Syracuse, NY 13244
>>     315-443-5977 <tel:315-443-5977>
>>
>>     _______________________________________________
>>     Halla12_software mailing list
>>     Halla12_software at jlab.org <mailto:Halla12_software at jlab.org>
>>     https://mailman.jlab.org/mailman/listinfo/halla12_software
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halla12_software/attachments/20150406/b380d920/attachment-0001.html