[BDXlist] Fwd: BDX software: event builder, DSTs. First ideas and thoughts

Tue Dec 27 12:11:14 EST 2016

Dear all,
I write you again concerning BDX DSTs, following-up my previous email 
(you find it below).

The first version of BDX DSXTs has been implemented, and is already 
working for the data being measured in Catania now with the second 
version of the prototype. The strategy that I followed - after some 
discussion with other people - is the second one outlined in my previous 
email.
Basically:

1) Each event is saved in a Root TTree as a TEvent class.
1A) The concept of "event" is clear for the current version of the 
prototype (1 event == 1 trigger), or for simulations (1 event == 1 
primary particle). It will be maybe different if we will implement a 
triggerless, pipeline DAQ system, but for the moment I neglect this and 
focus on the two previous cases.

2) The TEvent is, basically, a collection of information (it is very 
similar to the event structure we have in HPS - I mutuated it from 
there). You can find the relevant code in the BDX distribution, 
src/libraries/EventBuilder/TEvent.h. The relevant members are:

....

private:

     TEventHeader    *m_eventHeader;
     vector<TClonesArray*> m_collections;
....

In other words, the TEvent contains:

-> The Event Header (run number, event number, event type, event time, 
trigger words)
-> A vector of TClonesArray: each TClonesArray is a collection of 
"objects" (or "hits"), of a given type. This means that the event itself 
is "system-independent", the same structure is good for real data 
(prototype) / simulations / .. , provided that the event builder saves 
in the TEvent the relevant collections.

I send you an example macro (and a datafile) showing how to open this 
kind of DST, understand what is inside the TEvent, and read it.
Clearly, the are many other ways to read the DSTs. Here, I am using a 
root TSelector, and a root macro. One can use a standalone C++ code, and 
use a normal loop instead of a TSelector.

Pre-requirements:

1) BDX software installed on a non - MAC machine (still need to figure 
out how to build root dictionaries on a MAC)
1A) The env var BDXRECO_ROOT pointing to the installation folder
2) Root6 installed (jlab common software, version 2.0, is now using root 
6. This is why I am using it for DSTs).

Files:
1) A root file (out.root) containing the TTree with events (EventDST tree).
1A) The root file has been created with 4000 events from prototype data. 
Collections saved are calibrated hits from calorimeter, int veto, and 
Ext Veto.
1B) There're also two other TTress (RunInfo and EventHeader). Ignore them.
2) Two files contaning the TSelector-derived class (BDXDSTSelector.cc 
and BDXDSTSelector.h)
3) A macro file to run the analysis (doAna.C)

Instructions:
1) Save all files to a folder of your choice.
2) Make sure BDXRECO_ROOT env var is set, pointing to the folder where 
the BDX software is installed
3) Open root and issue these commands:
* .L doAna.C
* doAna("out.root")

The macro is quite simple. It is checking that the event has a 
collection named "CalorimeterHits", made of objects of class 
"CalorimeterHit". If so, it gets this collection, and loops over the 
CalorimeterHit objects. If it founds a hit with sector=0, X=1, Y=1, it 
fills a histogram.

What still has to be done (somehow in order of priorities):

1) Have root dictionaries also built on MAC (need to access to a MAC 
machine with root6 installed).
2) Decide what to save in DSTs for Catania measurements. Currently, for 
each event 3 collections are saved: Calibrated hits for calorimeter, 
IntVeto, ExtVeto.
2A) Document the Event structure for Catania measurements once defined.
3) Write the event builder for simulations (different collections will 
be saved in this case), both for prototype simulations and full detector 
simulations.

Technical details:
1) The DST idea here shown makes heavy use of ROOT reflectivity features.
2) The DST reader, as shown here, is proof compatible (provided one 
takes care in setting to 0 all the pointers to objects that will be 
crated in the SlaveBegin method)
3) Although not easily (and absolutely not recommended), one can 
interact with ROOT DSTs also from root command line. The following set 
of instructions should produce the same histogram as in the attached macro

gSystem->Load("$BDXRECO_ROOT/lib/libbdxReco.so")
TFile f("out.root")
TH1D hSimple("hSimple","hSimple",400,-0.1,400)
EventDST->Scan("m_collections.GetName()")          -> From this command 
I see the collection with calorimeter hits is always the first (istance 
== 0)
EventDST->Draw("((CalorimeterHit*)(m_collections[0].At(2)))->E>>hSimple") 
-> The hit #2 is the one I am interested in

Bests

Andrea

>
> -------- Forwarded Message --------
> Subject: 	[BDXlist] BDX software: event builder, DSTs. First ideas and 
> thoughts
> Date: 	Wed, 13 Jul 2016 17:33:15 +0200
> From: 	Andrea Celentano <Andrea.Celentano at ge.infn.it>
> To: 	bdxlist at jlab.org
>
>
>
> Dear all,
> this (quite long) e-mail is a first tentative to discuss how to proceed
> with the BDX software, and specifically how to define an "event", and
> how to produce from "events" an "output" (DST) that can be analyzed by
> people not directly involved in the JANA framework development.
>
> The following points are well defined:
>
> 1) Whatever the DAQ system will be, we can always talk about an "event"
> as a single "entity" that is written on file. Events contain multiple
> information from different detector elements, and are independent.
> Events should be self-consistent, and do not depend on other events.
>
> 1A) In a "fixed DAQ window" scheme, this mean that all the acquisition
> windows should be long enough to contain data coming at different times,
> but physically related. Example: muon+ with e+ emission, e+ impinging on
> a crystal. We can expect to have a hit in a VETO (due to entering mu+) ~
> 2 us before the hit in the crystal. These two hits must be in the same
> "event", i.e. in the same piece of information written to the file.
>
> 1B) In a sophisticated trigger-less system, as the one we foresee, the
> concept of event is much more dynamic and versatile. There will be one
> (or multiple) online trigger algorithms looking at the full data-stream
> from the full detector, identifying proper combination of hits - where a
> "combination" is defined by the physics - and write that combination as
> an "event".
>
> 1C) In the MC, this kind of implementation is very natural: each "event"
> is associated with a single primary particle. All the hits resulting in
> any evolution of this particle - and of its daughters - are an event.
>
>
>
> 2) Reconstruction is performed using the JANA framework. By
> "reconstruction" I mean the algorithms that, from the RAW evio files -
> or whatever format it will be - permits to obtain calibrated information
> for each hit in each sub-detector. Also, in the reconstruction
> elaborated quantities are produced - basically clusters in the
> calorimeter, both "single-module" clusters and "multi-module" clusters.
>
> 2A) In the reconstruction, each event - as discussed in the previous
> point - is absolutely independent from the others. It is not foreseen at
> all to use information from event A when reconstructing event B.
>
>
> 3) DSTs will be ROOT files, with a ROOT tree, where a C++ object - the
> "event" is saved. In other words, "events" are C++ objects. This gives
> maximum flexibility in defining what to write in an event.
>
> Here is what I think we should write in an event - and how do to this.
>
> A) General data:
>
> RunN
> EventN
> EventType (real, MC, ...)
> Absolute event time (for real data)
> ... others ...
>
> This should go in an "event header", that is by itself a C++ object. The
> event has a pointer to this object.
>
> B) Trigger data
>
> This will really depend on the type of DAQ we will use, but information
> about which trigger selected the event should be there.
>
> C) MC-truth data
>
> For MC-only: which kind of primary event was simulated, resulting in
> this event? Again, a C++ object, the event has a pointer to this (for MC
> only)
>
> D) Low-level calibrated data
>
> For each sub-detector, all the hits, calibrated in energy and time. Note
> that a given sub-detector element can have more than one hit. Example:
> mu+ entering ext veto, int veto, and stopping in crystal. Mu+ then decay
> to e+, e+ exits from the crystal and hit again int. veto, the same
> element as before. All this information should be in the same "event".
>
> Each hit is a C++ object. Hits are contained in proper C++ collections -
> can be simple vectors, or more powerful and performant collections, such
> as ROOT TClonesArrays.
>
> E) Elaborated data
>
> I think that here we want to start with calorimeter clusters.
>
> * Should we do clustering in the full detector, or start with clusters
> in each calorimeter module, and then combine these?
> * What are the other kind of data we need here?
>
> --------------------------------------------------------
>
> Finally, more "technical" question, that I think it is worth to discuss.
>
> The structure of the event can be something very specific, i.e. a class
> with specific data in it, already predefined, and not changeable, like:
>
> class TEvent{
> ...
> ...
> vector<CalorimeterHit> theCalorimeterHits;
> CalorimeterCluster theCluster;
> ...
> ...
> }
>
> or something more versatile, like:
>
> class TEvent{
> ...
> ...
> vector<TClonesArray*> theRawHitCollections;
> vector<TClonesArray*> theReconstructedObjectsCollections;
> vector<TObject*> theObjectsInThisEvent;
> ...
> ...
> }
>
> The advantage of the first approach is that it is easier to use in an
> analysis (you need the calorimeter hits: you have them in a vector), and
> is self-explicative at the code-level. However, the structure of an
> event is fixed, and can't change... (example, if later you want to to
> add another vector in the event, you need to modify the Jana code that
> is producing the event, and the event class itself, being potentially
> not backward-compatible).
>
> The advantage of the second approach (that I prefer :) ) is that the
> structure of an event is really versatile. Each specific event (thanks
> to ROOT reflectivity) is self-explicative, and one can do something like
> (please note this is just pseudo-code):
>
> TEvent event; //an event, get it from the DST
> event->ListRawHitCollections() //which are the raw hits in this event?
> ...
> if (event->hasCollection("CalorimeterHit")){
> ... loop on the CalorimeterHits ...
> }
>
> For a different project, I already developed an analysis code like this,
> that I'll be happy to show at one of the next BDX meetings. The
> disadvantage of this approach is a longer learning curve to use DSTs.
>
>
> Bests,
> Andrea
>
>
>
>
>
>
>
> _______________________________________________
> BDXlist mailing list
> BDXlist at jlab.org
> https://mailman.jlab.org/mailman/listinfo/bdxlist

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/bdxlist/attachments/20161227/5fbb46af/attachment-0001.html>
-------------- next part --------------
#define BDXDSTSelector_cxx
// The class definition in BDXDSTSelector.h has been generated automatically
// by the ROOT utility TTree::MakeSelector(). This class is derived
// from the ROOT class TSelector. For more information on the TSelector
// framework see $ROOTSYS/README/README.SELECTOR or the ROOT User Manual.

// The following methods are defined in this file:
//    Begin():        called every time a loop on the tree starts,
//                    a convenient place to create your histograms.
//    SlaveBegin():   called after Begin(), when on PROOF called only on the
//                    slave servers.
//    Process():      called for each event, in this function you decide what
//                    to read and fill your histograms.
//    SlaveTerminate: called at the end of the loop on the tree, when on PROOF
//                    called only on the slave servers.
//    Terminate():    called at the end of the loop on the tree,
//                    a convenient place to draw/fit your histograms.
//
// To use this file, try the following session on your Tree T:
//
// root> T->Process("BDXDSTSelector.C")
// root> T->Process("BDXDSTSelector.C","some options")
// root> T->Process("BDXDSTSelector.C+")
//

#include "BDXDSTSelector.h"
#include <TH2.h>
#include <TStyle.h>

void BDXDSTSelector::Begin(TTree * /*tree*/)
{
   // The Begin() function is called at the start of the query.
   // When running with PROOF Begin() is only called on the client.
   // The tree argument is deprecated (on PROOF 0 is passed).

   TString option = GetOption();
}

void BDXDSTSelector::SlaveBegin(TTree * /*tree*/)
{
   // The SlaveBegin() function is called after the Begin() function.
   // When running with PROOF SlaveBegin() is called on each slave server.
   // The tree argument is deprecated (on PROOF 0 is passed).

   TString option = GetOption();

   /*Create here the histograms. All of them MUST be added to the fOutput list for merging*/
   hSimple=new TH1D("hSimple","hSimple",400,0.1,400);fOutput->Add(hSimple);

}

Bool_t BDXDSTSelector::Process(Long64_t entry)
{
   // The Process() function is called for each entry in the tree (or possibly
   // keyed object in the case of PROOF) to be processed. The entry argument
   // specifies which entry in the currently loaded tree is to be processed.
   // When processing keyed objects with PROOF, the object is already loaded
   // and is available via the fObject pointer.
   //
   // This function should contain the \"body\" of the analysis. It can contain
   // simple or elaborate selection criteria, run algorithms on the data
   // of the event and typically fill histograms.
   //
   // The processing can be stopped by calling Abort().
   //
   // Use fStatus to set the return value of TTree::Process().
   //
   // The return value is currently not used.

   fReader.SetEntry(entry);

   /*Create here the pointers to the objects retrieved below*/
   CalorimeterHit *hit;

   /*Check if the event has a collection named CalorimeterHits, and the corresponding objects are CalorimeterHit objects*/
   if (event->hasCollection(CalorimeterHit::Class(),"CalorimeterHits")){
     TIter CaloHitsIter(event->getCollection(CalorimeterHit::Class(),"CalorimeterHits"));
     while (hit = (CalorimeterHit*)CaloHitsIter.Next()){ //Need to cast to the proper object
       if ((hit->m_channel.sector==0)&&(hit->m_channel.x==1)&&(hit->m_channel.y==1)){
	 hSimple->Fill(hit->E);
       }
     }
   }
   else{
     Info("Process","No collection named CalorimeterHits with object from the class CalorimeterHit has been found");
   }

   return kTRUE;
}

void BDXDSTSelector::SlaveTerminate()
{
   // The SlaveTerminate() function is called after all entries or objects
   // have been processed. When running with PROOF SlaveTerminate() is called
   // on each slave server.

}

void BDXDSTSelector::Terminate()
{
   // The Terminate() function is the last function to be called during
   // a query. It always runs on the client, it can be used to present
   // the results graphically or save the results to file.

  TCanvas *c=new TCanvas("c","c");
  hSimple->Draw();

}
-------------- next part --------------
//////////////////////////////////////////////////////////
// This class has been automatically generated on
// Tue Dec 27 16:37:57 2016 by ROOT version 6.08/00
// from TTree EventDST/EventDST
// found on file: outTest.root
//////////////////////////////////////////////////////////

#ifndef BDXDSTSelector_h
#define BDXDSTSelector_h

#include <TROOT.h>
#include <TChain.h>
#include <TFile.h>
#include <TSelector.h>
#include <TTreeReader.h>
#include <TTreeReaderValue.h>
#include <TTreeReaderArray.h>
#include <TClonesArray.h>
#include <TCollection.h>

#include <TH1D.h>
#include <TH2D.h>
#include <TAxis.h>
#include <TCanvas.h>

// Headers needed by this particular selector
#include "EventBuilder/TEvent.h"
#include "EventBuilder/TEventHeader.h"

#include "Calorimeter/CalorimeterHit.h"

class BDXDSTSelector : public TSelector {
public :
   TTreeReader     fReader;      //!the tree reader
   TTree          *fChain = 0;   //!pointer to the analyzed TTree or TChain

   // Readers to access the data 
   TTreeReaderValue<TEvent> event = {fReader, "Event"};

   BDXDSTSelector(TTree * /*tree*/ =0) {
     /*All the histogram pointers MUST be initialized to 0 here*/
     hSimple=0;

   }
   virtual ~BDXDSTSelector() { }
   virtual Int_t   Version() const { return 2; }
   virtual void    Begin(TTree *tree);
   virtual void    SlaveBegin(TTree *tree);
   virtual void    Init(TTree *tree);
   virtual Bool_t  Notify();
   virtual Bool_t  Process(Long64_t entry);
   virtual Int_t   GetEntry(Long64_t entry, Int_t getall = 0) { return fChain ? fChain->GetTree()->GetEntry(entry, getall) : 0; }
   virtual void    SetOption(const char *option) { fOption = option; }
   virtual void    SetObject(TObject *obj) { fObject = obj; }
   virtual void    SetInputList(TList *input) { fInput = input; }
   virtual TList  *GetOutputList() const { return fOutput; }
   virtual void    SlaveTerminate();
   virtual void    Terminate();

   /*Histogram pointers. Note that EACH histogram pointer MUST be initialized to 0 in the TSelector constructor*/
   TH1D *hSimple;

   ClassDef(BDXDSTSelector,0);

};

#endif

#ifdef BDXDSTSelector_cxx
void BDXDSTSelector::Init(TTree *tree)
{
   // The Init() function is called when the selector needs to initialize
   // a new tree or chain. Typically here the reader is initialized.
   // It is normally not necessary to make changes to the generated
   // code, but the routine can be extended by the user if needed.
   // Init() will be called many times when running on PROOF
   // (once per file to be processed).

   fReader.SetTree(tree);
}

Bool_t BDXDSTSelector::Notify()
{
   // The Notify() function is called when a new file is opened. This
   // can be either for a new TTree in a TChain or when when a new TTree
   // is started when using PROOF. It is normally not necessary to make changes
   // to the generated code, but the routine can be extended by the
   // user if needed. The return value is currently not used.

   return kTRUE;
}

#endif // #ifdef BDXDSTSelector_cxx
-------------- next part --------------
#include "TProof.h"
#include "TChain.h"

void doAna(string fname){

  int doProof=1;  //use 1 if you want PROOF
  int nWorkers=4; //how many PROOF workers?

  //add BDX include paths
  gSystem->AddIncludePath("-I${BDXRECO_ROOT}/src/libraries");
  gSystem->AddIncludePath("-I./");
  //load the library with dictionaries
  gSystem->Load("$BDXRECO_ROOT/lib/libbdxReco.so");

  /*Open the TFile, get the DST ttree, process it. Use a TChain since:
    1) It is PROOF-compatible
    2) This macro can be used with multiple files (using wildcards for fname)*/
  TChain *m_chain=new TChain("EventDST");
  m_chain->Add(fname.c_str());

  /*Set proof if necessary*/
  TProof *m_proof; 
  if (doProof){
    m_proof=TProof::Open(Form("workers=%i",nWorkers));
    m_proof->Exec("gSystem->Load(\"${BDXRECO_ROOT}/lib/libbdxReco.so\")");
    m_proof->SetLogLevel(1, TProofDebug::kPacketizer);
    m_proof->SetParameter("PROOF_Packetizer", "TPacketizer");
    m_chain->SetProof(); //enable or not proof. 
  }

  m_chain->Process("BDXDSTSelector.cc++");

}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out.root
Type: application/octet-stream
Size: 3445978 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/bdxlist/attachments/20161227/5fbb46af/attachment-0001.obj>