[BDXlist] BDX software: event builder, DSTs. First ideas and thoughts

Wed Jul 20 14:09:47 EDT 2016

Hello Andrea,

Sorry to only now respond to your email about the BDX code.

I don’t quite agree with your idea of making the “TEvent” flexible by storing collections of collections (or vector of vectors). Here is why:

The ROOT format is already fully flexible and allow for “schema evolution” without too much difficulty. In other words, when you decide to add more data to your TEvent class, you modify the class, and you increase the class version number by one. You almost never need to read old data that was produced with the previous version of the class, since your new format presumably is “better”. If you do absolutely need to read old data for backward compatibility, it is quite easy to customize the class streamer to handle the old format, which is detected by the class version number.
The vector<TClonesArray *> adds some serious additional complications to your code and how to use it. It makes Tree:Draw() pretty much unusable. It also makes it confusing what is actually available when you use the DST, despite the reflection in ROOT. I think the simplest solution for a DST is usually the best solution.
When you need to add additional data to the DST, you need to add new code to the DST analysis code to make use of it. Again, the flexibility added by having vectors of vectors is now mostly negated.
If you make your RawCalorimeterHit, and other “RawHit” classes, sufficiently broad from the start, there should be little need to change them later. 

So, I would favor your first implementation over your second one.

Best,
	Maurik

> On Jul 13, 2016, at 11:33 AM, Andrea Celentano <andrea.celentano at ge.infn.it> wrote:
> 
> Dear all,
> this (quite long) e-mail is a first tentative to discuss how to proceed with the BDX software, and specifically how to define an "event", and how to produce from "events" an "output" (DST) that can be analyzed by people not directly involved in the JANA framework development.
> 
> The following points are well defined:
> 
> 1) Whatever the DAQ system will be, we can always talk about an "event" as a single "entity" that is written on file. Events contain multiple information from different detector elements, and are independent. Events should be self-consistent, and do not depend on other events.
> 
> 1A) In a "fixed DAQ window" scheme, this mean that all the acquisition windows should be long enough to contain data coming at different times, but physically related. Example: muon+ with e+ emission, e+ impinging on a crystal. We can expect to have a hit in a VETO (due to entering mu+) ~ 2 us before the hit in the crystal. These two hits must be in the same "event", i.e. in the same piece of information written to the file.
> 
> 1B) In a sophisticated trigger-less system, as the one we foresee, the concept of event is much more dynamic and versatile. There will be one (or multiple) online trigger algorithms looking at the full data-stream from the full detector, identifying proper combination of hits - where a "combination" is defined by the physics - and write that combination as an "event".
> 
> 1C) In the MC, this kind of implementation is very natural: each "event" is associated with a single primary particle. All the hits resulting in any evolution of this particle - and of its daughters - are an event.
> 
> 
> 
> 2) Reconstruction is performed using the JANA framework. By "reconstruction" I mean the algorithms that, from the RAW evio files - or whatever format it will be - permits to obtain calibrated information for each hit in each sub-detector. Also, in the reconstruction elaborated quantities are produced - basically clusters in the calorimeter, both "single-module" clusters and "multi-module" clusters.
> 
> 2A) In the reconstruction, each event - as discussed in the previous point - is absolutely independent from the others. It is not foreseen at all to use information from event A when reconstructing event B.
> 
> 
> 3) DSTs will be ROOT files, with a ROOT tree, where a C++ object - the "event" is saved. In other words, "events" are C++ objects. This gives maximum flexibility in defining what to write in an event.
> 
> Here is what I think we should write in an event - and how do to this.
> 
> A) General data:
> 
> RunN
> EventN
> EventType (real, MC, ...)
> Absolute event time (for real data)
> ... others ...
> 
> This should go in an "event header", that is by itself a C++ object. The event has a pointer to this object.
> 
> B) Trigger data
> 
> This will really depend on the type of DAQ we will use, but information about which trigger selected the event should be there.
> 
> C) MC-truth data
> 
> For MC-only: which kind of primary event was simulated, resulting in this event? Again, a C++ object, the event has a pointer to this (for MC only)
> 
> D) Low-level calibrated data
> 
> For each sub-detector, all the hits, calibrated in energy and time. Note that a given sub-detector element can have more than one hit. Example: mu+ entering ext veto, int veto, and stopping in crystal. Mu+ then decay to e+, e+ exits from the crystal and hit again int. veto, the same element as before. All this information should be in the same "event".
> 
> Each hit is a C++ object. Hits are contained in proper C++ collections - can be simple vectors, or more powerful and performant collections, such as ROOT TClonesArrays.
> 
> E) Elaborated data
> 
> I think that here we want to start with calorimeter clusters.
> 
> * Should we do clustering in the full detector, or start with clusters in each calorimeter module, and then combine these?
> * What are the other kind of data we need here?
> 
> --------------------------------------------------------
> 
> Finally, more "technical" question, that I think it is worth to discuss.
> 
> The structure of the event can be something very specific, i.e. a class with specific data in it, already predefined, and not changeable, like:
> 
> class TEvent{
> ...
> ...
> vector<CalorimeterHit> theCalorimeterHits;
> CalorimeterCluster theCluster;
> ...
> ...
> }
> 
> or something more versatile, like:
> 
> class TEvent{
> ...
> ...
> vector<TClonesArray*> theRawHitCollections;
> vector<TClonesArray*> theReconstructedObjectsCollections;
> vector<TObject*> theObjectsInThisEvent;
> ...
> ...
> }
> 
> The advantage of the first approach is that it is easier to use in an analysis (you need the calorimeter hits: you have them in a vector), and is self-explicative at the code-level. However, the structure of an event is fixed, and can't change... (example, if later you want to to add another vector in the event, you need to modify the Jana code that is producing the event, and the event class itself, being potentially not backward-compatible).
> 
> The advantage of the second approach (that I prefer :) ) is that the structure of an event is really versatile. Each specific event (thanks to ROOT reflectivity) is self-explicative, and one can do something like (please note this is just pseudo-code):
> 
> TEvent event; //an event, get it from the DST
> event->ListRawHitCollections() //which are the raw hits in this event?
> ...
> if (event->hasCollection("CalorimeterHit")){
> ... loop on the CalorimeterHits ...
> }
> 
> For a different project, I already developed an analysis code like this, that I'll be happy to show at one of the next BDX meetings. The disadvantage of this approach is a longer learning curve to use DSTs.
> 
> 
> Bests,
> Andrea
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> BDXlist mailing list
> BDXlist at jlab.org
> https://mailman.jlab.org/mailman/listinfo/bdxlist

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/bdxlist/attachments/20160720/24df7544/attachment.html>