[Halld-offline] REST Data Format

Thu Jul 5 09:12:14 EDT 2012

Richard,

I think that the question of whether to put things on a branch or on the 
trunk depends on whether the new stuff is likely to break the trunk. We 
should not break the trunk on purpose. Functional but not yet perfect is 
OK on the trunk. The branch is good for checking in intermediate forms 
of the code in a place that will not bother anyone else.

Making the branch is pretty easy; I can do that. The list of directories 
under sim-recon/src seems to me to be:

libraries
   HDDM
programs
   Utilities
     plugins
     hddm
BMS

Let me know whether we want to go branch or trunk.

   -- Mark

On 07/03/2012 08:19 PM, Richard Jones wrote:
> Mark and all,
>
> This is a good discussion, and should continue.  I suggest that we 
> bring it up at the next offline meeting.  As for the REST version 0.5, 
> I wrote those classes last week after we settled (so I thought) on an 
> initial consensus for the data model.  I am currently testing it with 
> Igor's analysis workflow.  Will report on results at the next offline 
> meeting.
>
> Question: do you want me to check any of this stuff in?  trunk? 
> separate branch?  If that latter, Mark can you create the branch you 
> want me to populate.  The change content so far is:
>
>   * new hddm template rest.xml in the library/HDDM dir (what I showed
>     last meeting).
>   * new classes DEventSourceREST and DEventSourceRESTGenerator (clones
>     of DEventSourceHDDM and DEventSourceHDDMGenerator) in the
>     library/HDDM dir;
>   * edits to DApplication.cc to add the DEventSourceREST as a
>     supported event source within the dana framework;
>   * modification of CheckOpenable method of DEventSourceHDDMGenerator
>     so that it can tell the difference between a hitsView hddm file
>     and a REST hddm file;
>   * addition of a new plugin danarest (clone of danahddm) to write
>     REST files;
>   * changes to existing hddm c++ library to better match the dana
>     framework;
>   * minor fixes to BMS to support building applications that use the
>     hddm c++ classes.
>
> -Richard Jones
>
>
> -Richard J.
>
>
> On 7/3/2012 10:21 AM, Mark M. Ito wrote:
>> Paul,
>>
>> I think expanding the list of data is probably a good idea to make the
>> output more useful for analysis. The idea would be that decisions that
>> are appropriate at the analysis level be make-able at the analysis
>> level. But each byte needs to examined critically, as we are doing.
>>
>> I'm not sure about the detector plane intersection points though, since
>> those are derivable by swimming. I think the CPU time is cheap on the
>> scale of things and the resulting information from swimming is
>> potentially much richer.
>>
>>     -- Mark
>>
>>
>> On 07/02/2012 05:05 PM, Paul Mattione wrote:
>>> Matt Shepherd and I have had a lot more discussion about the data included in the REST format, and we think that the results that are currently planned on being stored are too sparse.  We think that we should include all of the relevant reconstructed information from the different detector systems (not just BCAL & FCAL, but also TOF and SC), along with the results from the time-based tracking.  This doesn't include hit-level ADCs and TDCs, just things like the final reconstructed hit position, time, and uncertainties in the different systems (not the CDC and FDC hits, of course).  We think that this would be the highest level of analysis-independent information that can be provided.  This would allow us to later study and tweak the matching between the charged tracks and the hits in the different systems, which you wouldn't be able to do otherwise without reconstructing all of the data again.
>>>
>>> The question is then whether or not we also store the results of the matching between the charged tracks and the hits in the different systems (e.g. the DNeutralShower  and DChargedTrack objects (e.g. DChargedTrack contains things like TOF hit time projected to the beamline)).  From an analysis tools point-of-view it wouldn't matter: you request JANA to give you the DChargedTrack objects, and it would return them whether they were directly saved in REST or had to built from the saved DTrackTimeBased (and other) objects.
>>>
>>> If we don't store the track-system matching results it would save disk space (e.g. the matching matrix between charged tracks and calorimeter showers is not 1-to-1, but many-to-many), but increase cpu time.  To match the charged tracks to the detector system hits we have to swim the track out from the beamline through GlueX.  The full results from this swim are far too large to save in REST, but we could save the intersection of each track (in space and time) with each detector system (BCAL, FCAL, etc.) in REST to save time on this step.  The track could always be re-swum if necessary, but it shouldn't need to be.
>>>
>>> In summary, I (and perhaps Matt) propose that we store the results of the following objects in REST:
>>>
>>> DTrackTimeBased (only the relevant quantities (no DReferenceTrajectory, vector<DStartTime_t>, tracking pulls, t0, t1, etc.), plus the intersection with the detector elements (BCAL, TOF, etc.))
>>> DBCALShower
>>> DFCALShower
>>> DTOFPoint
>>> DSCHit
>>>
>>> This may not be necessary for the current data challenge, but I think it's best for our long-term analysis needs.  What do you guys think?
>>>
>>>    - Paul
>>>
>>>
>>> _______________________________________________
>>> Halld-offline mailing list
>>> Halld-offline at jlab.org  <mailto:Halld-offline at jlab.org>
>>> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-- 
Mark M. Ito
Jefferson Lab (www.jlab.org)
(757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20120705/d5287dbe/attachment-0002.html>