[Halld-offline] REST Data Format

Mon Jul 2 17:05:35 EDT 2012

Matt Shepherd and I have had a lot more discussion about the data included in the REST format, and we think that the results that are currently planned on being stored are too sparse.  We think that we should include all of the relevant reconstructed information from the different detector systems (not just BCAL & FCAL, but also TOF and SC), along with the results from the time-based tracking.  This doesn't include hit-level ADCs and TDCs, just things like the final reconstructed hit position, time, and uncertainties in the different systems (not the CDC and FDC hits, of course).  We think that this would be the highest level of analysis-independent information that can be provided.  This would allow us to later study and tweak the matching between the charged tracks and the hits in the different systems, which you wouldn't be able to do otherwise without reconstructing all of the data again.  

The question is then whether or not we also store the results of the matching between the charged tracks and the hits in the different systems (e.g. the DNeutralShower  and DChargedTrack objects (e.g. DChargedTrack contains things like TOF hit time projected to the beamline)).  From an analysis tools point-of-view it wouldn't matter: you request JANA to give you the DChargedTrack objects, and it would return them whether they were directly saved in REST or had to built from the saved DTrackTimeBased (and other) objects.  

If we don't store the track-system matching results it would save disk space (e.g. the matching matrix between charged tracks and calorimeter showers is not 1-to-1, but many-to-many), but increase cpu time.  To match the charged tracks to the detector system hits we have to swim the track out from the beamline through GlueX.  The full results from this swim are far too large to save in REST, but we could save the intersection of each track (in space and time) with each detector system (BCAL, FCAL, etc.) in REST to save time on this step.  The track could always be re-swum if necessary, but it shouldn't need to be.  

In summary, I (and perhaps Matt) propose that we store the results of the following objects in REST:

DTrackTimeBased (only the relevant quantities (no DReferenceTrajectory, vector<DStartTime_t>, tracking pulls, t0, t1, etc.), plus the intersection with the detector elements (BCAL, TOF, etc.))
DBCALShower
DFCALShower
DTOFPoint
DSCHit

This may not be necessary for the current data challenge, but I think it's best for our long-term analysis needs.  What do you guys think?  

 - Paul