[Halld-offline] Storage and Analysis of Reconstructed Events

Mon Feb 13 10:30:28 EST 2012

Eugene,

On Feb 13, 2012, at 10:17 AM, Eugene Chudakov wrote:

>> This high level skim include all events that looked like they had some
>> hadrons in the final state (not cosmic, bhabha scattering, etc.)  It
>> was a substantial fraction of all events recorded and pretty much any
>> event that you would want for a physics analysis.  All of this was on
>> disk -- we had several large RAID systems.
> 
> And you wrote in the previous mail:
> 
>>>> The framework could simultaneously deliver the remainder of the data
>>>> that was in the high level reconstruction skim with each event.  Its
> 
> Was this "high level reconstruction skim" kept in sequential files,
> that one needed to read all of them to extract the data for the
> selected events?

Yes sequential files were used, with some supplemental indexing files.  This is discussed in detail in this paper:

http://www.lepp.cornell.edu/~cdj/publications/conferences/CHEP04/EventStore.pdf

The performance with sequential files where all data are read in compared with using indexing information appears in Fig 2 and 3.  Figure 2 shows that you pay some small overhead penalty if you are interested in every event, but as soon as you want a subset of events (Fig. 3) then there are great performance gains.

> If GlueX has 100 times more events - is it conceivable to keep
> the "high level reconstruction skim" on disks?

I think the question here is probably not related to the number of events (100 times) but the actual size of the data set.  We should do a careful calculation.  It is probably then a question of computer resources.  Maybe for the initial data taking we put everything on disk.  We can use this data and the results of initial analyses to develop some event selection criteria and choose to only retain on disk a fraction of subsequent high luminosity runs.

Matt

> 
> Eugene
> 
> 
> On Mon, 13 Feb 2012, Matthew Shepherd wrote:
> 
>> 
>> Eugene,
>> 
>> On Feb 12, 2012, at 2:22 PM, Eugene Chudakov wrote:
>> 
>>> You wrote:
>>>> The framework could simultaneously deliver the remainder of the data
>>>> that was in the high level reconstruction skim with each event.  Its
>>>> nice because you get a reduced number of events with enhanced
>>>> information without the cost of duplicating the initial skim output.
>>> 
>>> I am not sure I get it.  Does the "high level reconstruction skim"
>>> sample include all the reconstructed events or a subsample of them? In
>>> the former case they can hardly be stored on disks with a quick
>>> access. In the latter case - how many such "skims" were kept on disks?
>> 
>> This high level skim include all events that looked like they had some hadrons in the final state (not cosmic, bhabha scattering, etc.)  It was a substantial fraction of all events recorded and pretty much any event that you would want for a physics analysis.  All of this was on disk -- we had several large RAID systems.
>> 
>> The subsequent skims "D tag" skim for example, selected a subset of these events, but did not copy them to an additional file.  (The only additional file that was created contained specialized info about D reconstruction.)  The data access system in CLEO allowed random access to events in the original sample so it was efficient to run over a subsample of events.  There was also little disk penalty for creating additional skims since you weren't copying events each time.
>> 
>> Using a CLEO-like system in GlueX one could have many useful skims, e.g., a Primakoff skim that had high energy deposition in the FCAL, a skim where some particle (eta', omega, etc.) was reconstructed.  This would make a great starting place for analyses rather than going back and running through all events each time.
>> 
>>> It would be useful to see how many events CLEO had written out and in
>>> which format, for the permanent storage, for a relatively long term
>>> (months), for a short term (days) etc. Of course, the term storage is
>>> associated with a certain analysis.
>> 
>> I can try to find more details on this.
>> 
>> Matt
>>