[Halld-offline] Data Challenge Meeting Minutes, July 16, 2012
David Lawrence
davidl at jlab.org
Wed Jul 18 15:34:04 EDT 2012
Hi Richard,
I think storing the config info as a string either in the first event
or somewhere else in the header of the file using the existing
mechanisms already built into HDDM is still a good idea and not just
because of redundancy with the database. One could imagine having a file
and wanting to extract the configuration used to make it in order to
view or use it. I see this as being very analogous to how the HDDM
schema is stored in the front of the HDDM file and there is a tool to
pull it out if needed. So to answer your devil's advocate question with
another: Why not put the hddm schema in a database and not keep it in
the file?
Regards,
-David
On 7/18/12 3:09 PM, Richard Jones wrote:
> Hello,
>
> I have no objection to storing a string tag for each object,
> representing the GetTag() string from jana. That can be done either
> on an event-by-event basis or globally. Event-by-event should only be
> adopted if the analysis can handle the situation where tags switch
> dynamically within a job, or we want to store more than one tag (say
> both default and "KLOE" bcal clusters) and let the user decide which
> to use. That would require changes to the current
> DEventSourceREST.cc, but would be easy to do. If tags are stored
> globally, then the hddm system will ensure automatically that only
> streams with the same tag strings get merged together as a result of a
> skim or by hddm-cat. It would also provide a better way for the
> danarest plugin to decide which tag to use for each output object,
> instead of the provisional way I am handling it right now for
> DBCALShower objects, which David points out is incorrect in some cases.
>
> As to the idea of flooding the REST file header with analysis
> qualifiers, that is not something that hddm can do right now. I could
> add the capability, but I question why. The only function of the hddm
> header, as currently conceived, is to document to the hddm toolkit how
> to unpack the event data and what their meaning and relationships are.
> That is all it does. It is not a place to record random comments like
> the name of the application that wrote the file, or the command line
> switches. User code does not normally even access the header, it is
> just handled by the hddm library. So at present, storing
> runconfig-type information would require adding special events to the
> stream, AND the huge change of making hddm streams stateful....
>
> Just like root trees, hddm streams designed to be stateless. This is
> an important design feature that I am not eager to concede. Think
> about trying to stick config-type information into a root tree, and
> then analyze it with a TSelector on PROOF. You are going to have to
> do major gymnastics to get that information to every analysis session
> that gets started to run your job. Building single-threaded concepts
> like this into the analysis sounds like we are still working like we
> did 20 years ago.
>
> It was not my original intent to embed metadata about the conditions
> of the production inside the file, because I want later to be able to
> string these events together and create skims. In general I want to
> avoid "stateful" streams in hddm, relying instead on the global keys
> like runnumber,eventnumber to reference database records for this
> information, similar to how root trees work. By keeping the streams
> stateless I avoid all kinds of ordering and synchronization issues. A
> related issue is the "skip to event NNN" action, which is very fast in
> hddm because you don't have to read in every event. Imagine a sparse
> skim, which in the limit would consist of one state record for every
> event. Do I stop and check every time I hit a state record, do a
> bit-by-bit comparison with the current state, and throw an exception
> on incompatible changes? Much better to check that compatibility
> ahead of time, from a database lookup, wouldn't it?
>
> If we want to make sure you don't lose the metadata, why not store
> them in two separate databases, or promote the database to a higher
> data security level? We are not going to be able to analyze a REST
> file without access to a database (required to re-swim reference
> trajectories). Playing devil's advocate, why are we not storing the
> magnetic field map in the event file? Without the magnetic field we
> cannot get back the original DTrackTimeBased objects.
>
> -Richard J.
>
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20120718/190dabcb/attachment-0002.html>
More information about the Halld-offline
mailing list