[Halld-offline] Overhaul ROOT TTree Format
Paul Mattione
pmatt at jlab.org
Fri Jul 24 11:29:07 EDT 2015
Hey guys, I think I need to change the fundamental design of our GlueX ROOT TTree format. I wanted to run this by everyone first though to get comments / feedback.
Right now, each particle combo is a separate entry in the TTree, which it makes it easy to write code to that just acts on combos (e.g. cuts). However, a problem arises when you want to fill histograms: to prevent double-counting, you have to look at the other combos to know whether you’ve already filled a histogram with that combination of particles. You can kludge this by just “keeping track” of what you did on previous tree entries, but this runs into a wall with PROOF.
With PROOF, you can analyze your tree multithreaded: ROOT splits up your TTree entries and sends them off to different slave nodes/cores. This is a huge speed-up when you have huge TTrees like we’ll have with GlueX. However, it’s highly likely that combos from one event might get split up amongst different nodes/cores, and I don’t think there’s any way for them to communicate with each other in real-time. Even if you could, the combos aren't analyzed at the same time between threads, so you’d have to keep track of the filling history for the entire length of the job.
So my new proposal is for each TTree entry to contain all of the information needed for a given event in the following way:
1) For all “good” reconstructed particle hypotheses, the data is stored in a set of “Hypothesis” arrays (e.g. one the Hypothesis__P4 array contains 4-momenta, Hypothesis__X4 spacetime, Hypothesis__CDCdEdx, etc.). So the data for the first particle is 0th index in each array, the second particle the first index, etc. Note that this is independent of your reaction, combos, etc.
2) Store combo-specific information in a separate set of arrays. E.g. PiMinus1__ObjectID corresponds to the first pi- in your particle combo (as it does now). However, it would now be an array, with each index in the array corresponding to a different combo. The value would be the index in the Hypothesis__ arrays that you can find the data. Note that combo-specific data such as kin-fit p4, kin-fit FOM, PID FOM, etc. would be stored like this.
3) Thrown information as is (similar to Hypothesis__ arrays)
This proposal makes it a little harder to analyze data: Within each entry you now have to loop over combos, and cutting combos and saving the reduced (skimmed) TTree is now much harder (must re-arrange the combo arrays). However, at least with this all of the information you need for the event is within a single entry, and we can use PROOF on the GlueX data.
Sorry for the long email, but this is pretty complicated. Any thoughts? I’m not going to have time to start on this for a while, but I want to get it working sometime within the next few months or so.
To make things a little easier, I have ideas about how to (optionally) easily make new “TParticleCombo" objects in the TSelector, but that’s a little ways further off.
- Paul
More information about the Halld-offline
mailing list