[Halld-offline] REST Data Format

Tue Jul 3 20:19:59 EDT 2012

Mark and all,

This is a good discussion, and should continue.  I suggest that we bring it up at the next offline meeting.  As for the REST version 0.5, I wrote those classes last week after we settled (so I thought) on an initial consensus for the data model.  I am currently testing it with Igor's analysis workflow.  Will report on results at the next offline meeting.

Question: do you want me to check any of this stuff in?  trunk? separate branch?  If that latter, Mark can you create the branch you want me to populate.  The change content so far is:

  * new hddm template rest.xml in the library/HDDM dir (what I showed last meeting).
  * new classes DEventSourceREST and DEventSourceRESTGenerator (clones of DEventSourceHDDM and DEventSourceHDDMGenerator) in the library/HDDM dir;
  * edits to DApplication.cc to add the DEventSourceREST as a supported event source within the dana framework;
  * modification of CheckOpenable method of DEventSourceHDDMGenerator so that it can tell the difference between a hitsView hddm file and a REST hddm file;
  * addition of a new plugin danarest (clone of danahddm) to write REST files;
  * changes to existing hddm c++ library to better match the dana framework;
  * minor fixes to BMS to support building applications that use the hddm c++ classes.

-Richard Jones

-Richard J.

On 7/3/2012 10:21 AM, Mark M. Ito wrote:
> Paul,
>
> I think expanding the list of data is probably a good idea to make the
> output more useful for analysis. The idea would be that decisions that
> are appropriate at the analysis level be make-able at the analysis
> level. But each byte needs to examined critically, as we are doing.
>
> I'm not sure about the detector plane intersection points though, since
> those are derivable by swimming. I think the CPU time is cheap on the
> scale of things and the resulting information from swimming is
> potentially much richer.
>
>     -- Mark
>
>
> On 07/02/2012 05:05 PM, Paul Mattione wrote:
>> Matt Shepherd and I have had a lot more discussion about the data included in the REST format, and we think that the results that are currently planned on being stored are too sparse.  We think that we should include all of the relevant reconstructed information from the different detector systems (not just BCAL & FCAL, but also TOF and SC), along with the results from the time-based tracking.  This doesn't include hit-level ADCs and TDCs, just things like the final reconstructed hit position, time, and uncertainties in the different systems (not the CDC and FDC hits, of course).  We think that this would be the highest level of analysis-independent information that can be provided.  This would allow us to later study and tweak the matching between the charged tracks and the hits in the different systems, which you wouldn't be able to do otherwise without reconstructing all of the data again.
>>
>> The question is then whether or not we also store the results of the matching between the charged tracks and the hits in the different systems (e.g. the DNeutralShower  and DChargedTrack objects (e.g. DChargedTrack contains things like TOF hit time projected to the beamline)).  From an analysis tools point-of-view it wouldn't matter: you request JANA to give you the DChargedTrack objects, and it would return them whether they were directly saved in REST or had to built from the saved DTrackTimeBased (and other) objects.
>>
>> If we don't store the track-system matching results it would save disk space (e.g. the matching matrix between charged tracks and calorimeter showers is not 1-to-1, but many-to-many), but increase cpu time.  To match the charged tracks to the detector system hits we have to swim the track out from the beamline through GlueX.  The full results from this swim are far too large to save in REST, but we could save the intersection of each track (in space and time) with each detector system (BCAL, FCAL, etc.) in REST to save time on this step.  The track could always be re-swum if necessary, but it shouldn't need to be.
>>
>> In summary, I (and perhaps Matt) propose that we store the results of the following objects in REST:
>>
>> DTrackTimeBased (only the relevant quantities (no DReferenceTrajectory, vector<DStartTime_t>, tracking pulls, t0, t1, etc.), plus the intersection with the detector elements (BCAL, TOF, etc.))
>> DBCALShower
>> DFCALShower
>> DTOFPoint
>> DSCHit
>>
>> This may not be necessary for the current data challenge, but I think it's best for our long-term analysis needs.  What do you guys think?
>>
>>    - Paul
>>
>>
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20120703/dd6e49ba/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3232 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20120703/dd6e49ba/attachment.p7s>