[Halld-offline] Gluex event sizes in pythia MC

Thu Oct 28 01:06:39 EDT 2010

Hi Richard,

     I can certainly see the benefits of the clean separation that you 
have advocated for some time. I even tried my best to make the arguments 
on your behalf when the issue came up at offline meetings over the last 
year or so. The consensus, however, was to make the truth information 
associated with each hit a little more accessible hence the 
contamination of the hits tree in the data model.

     Given the state of things as they are, I think the best estimates 
for event size from the hddm data would have to be derived by looking 
only at the number of hits in each detector. The raw data will likely 
have header words with the crate and slot number encoded followed by 
data words containing the digitized hit value in the lower bits and 
channel info in the upper bits. If it's easy to make plots of just the 
number of cathode hits and the number of anode hits, then we can 
calculate the bytes in the raw data file assuming two 4-byte words for  
each cathode hit (time and amplitude) and one 4-byte word for each anode 
hit (time only). Headers will have to be estimated in some way. Sascha 
has done this before and so may have some comment.

     Note that I'm hoping to start looking at the details of the raw 
data format in a couple of weeks. We will need a hddm to raw-data-evio 
converter so we can test out the online systems prior to first beam. 
This will hopefully give us a very accurate idea of the data volumes we 
can expect as well.

Regards,
-David

On 10/27/10 5:14 PM, Richard Jones wrote:
> David,
>
> Right.  Besides the hits themselves, and the truth points where each 
> track intersects a sense wire plane, there is now a hybrid tag called 
> "truth hits".  I guess the people working on the tracking (you? 
> Simon?) put those in, so only you know what they are there for.  When 
> I created the originial hddm structures I thought that I kept the hits 
> information in a different subtree from the truth, so that you are not 
> tempted to make a one-to-one correspondence in the tracking 
> reconstruction.  This segregation also makes it easy to separate the 
> byte counts for different types of information.  This intertwining of 
> hits and truth information in the case of the FDC is why I wasn't able 
> to make the factorization for the FDC between hits and truth the way I 
> did for the other detectors.  How would you suggest that I do the 
> accounting for the FDC, that segregates contributions from (a) hits, 
> (b) truth points, and (c) truth hits.
>
> -Richard J.
>
>
>
> On 10/27/2010 2:04 PM, David Lawrence wrote:
>> Hi Richard,
>>
>>       Do your numbers for the FDC filter out the "truth" values 
>> buried in
>> the forwardDC structure? When I look at the XML definition for the HDDM
>> file I see this:
>>
>> <forwardDC minOccurs="0">
>> <fdcChamber layer="int" maxOccurs="unbounded" module="int">
>> <fdcAnodeWire maxOccurs="unbounded" minOccurs="0" wire="int">
>> <fdcAnodeHit dE="float" maxOccurs="unbounded" t="float" itrack="int"
>> ptype="int"/>
>> <fdcAnodeTruthHit dE="float" maxOccurs="unbounded" t="float" d="float"
>> itrack="int" ptype="int"/>
>> </fdcAnodeWire>
>> <fdcCathodeStrip maxOccurs="unbounded" minOccurs="0" plane="int"
>> strip="int">
>> <fdcCathodeHit maxOccurs="unbounded" q="float" t="float" itrack="int"
>> ptype="int"/>
>> <fdcCathodeTruthHit maxOccurs="unbounded" q="float" t="float"
>> itrack="int" ptype="int"/>
>> </fdcCathodeStrip>
>> <fdcTruthPoint E="float" dEdx="float" dradius="float"
>> maxOccurs="unbounded" minOccurs="0" primary="boolean" ptype="int"
>> px="float" py="float" pz="float" t="float" track="int" x="float"
>> y="float" z="float"/>
>> </fdcChamber>
>> </forwardDC>
>>
>> The fdcAnodeTruthHit and fdcCathodeTruthHit will not be in the raw data
>> set. Also, if you used a more recent version of the data model (post
>> Sept. 28) then the the itrack and ptype fields in the fdcAnodeHit and
>> fdcCathodeHit structures are also there which won't be in the raw data.
>> If both of these are the case, then you're counting 12 words per hit as
>> opposed to 4 for the raw data. If only the first is true, then you may
>> be counting 8 words as opposed to 4 for the raw data. I only ask because
>> you break down the CDC into hits, truth hits and truth points, but I
>> don't see that for the FDC.
>>
>> Regards,
>> -Dave
>>
>> On 10/27/10 1:00 PM, Heyes Graham wrote:
>>> Richard,
>>>     thanks for the quick reply. I have had the response from various 
>>> people at various times that trigger thresholds would be set to keep 
>>> event sizes and rates within limits. What isn't clear is if the 
>>> rates and sizes have changed significantly since they were 
>>> guestimated several years ago.
>>>
>>> The last sentence of your email is a question aimed ad Matt who 
>>> isn't on the recipients list of the email.
>>>
>>>     Regards,
>>>         Graham
>>>
>>> On Oct 27, 2010, at 11:09 am, Richard Jones wrote:
>>>
>>>> Graham and all,
>>>>
>>>> We do not have a number on the encoding ratio between coda and 
>>>> hddm, but I believe that the two should be within 30% of each other 
>>>> for the identical set of hits being recorded.  The big effect -- we 
>>>> are looking at a factor 3.5 -- lies in the thresholds that are used 
>>>> to define what gets called a hit.  Matt points out that these 
>>>> thresholds are deliberately set artificially low in Monte Carlo so 
>>>> that we can add fluctuations later.  A better estimate for event 
>>>> sizes might be obtained by looking at the events coming out of 
>>>> mcsmear, after applying more realistic hit thresholds to those 
>>>> events to see what we get.
>>>>
>>>> Matt, do we have the necessary information about what those 
>>>> thresholds should be in order to allow me to carry out this exercise?
>>>>
>>>> -Richard J.
>>>>
>
>