[g13] tarring skim files

Thu Jul 22 17:11:41 EDT 2010

Well, you can get the expansion factor for the first pass of the 
cooking. It probably is quite similar to what the second will be. But if 
you want to know what it really is going to be, then you will have to 
wait until we get some of the new files cooked. From what I recall, it 
was ballpark a factor of three, though.

     Pawel

On 7/22/2010 4:49 PM, Dennis Weygand wrote:
> But from the discussion today, it appears that you must have some idea at this point what the expansion factor is...
> That's all I'm asking.
> Dennis
>
> On Jul 22, 2010, at 4:41 PM, Pawel Nadel-Turonski wrote:
>
>    
>> Dear Dennis,
>>
>> As I mentioned before, will provide you this information when the cooking actually starts and we have this information. The current discussion is only aimed at optimizing the packaging of the files for more efficient tape usage.
>>
>>     Pawel
>>
>> On 7/22/2010 4:35 PM, Dennis Weygand wrote:
>>      
>>> Can you tell me what the data expansion factor is? for each input byte, how many bytes of output are there?
>>> Dennis
>>>
>>> On Jul 22, 2010, at 3:57 PM, Paul Mattione wrote:
>>>
>>>
>>>        
>>>> Each cooking job cooks a single raw file, and the resulting cooked
>>>> file is then skimmed.  This produces a lot of output files for the
>>>> single job (skims, dsts, ntuples, anahists, gflux, etc).  It's easier
>>>> to just to group all of these files together since they're all
>>>> produced by the same job, than to try to combine things across 50+ jobs.
>>>>
>>>>   - Paul
>>>>
>>>> On Jul 22, 2010, at 3:48 PM, slava at jlab.org wrote:
>>>>
>>>> You lost me at "From the standpoint...". Why is it easiest?
>>>>
>>>> Slava
>>>>
>>>>
>>>>          
>>>>>  From the standpoint of cooking, I think the easiest way of writing
>>>>> our skims, dsts, monitoring histograms, etc to tape is to combine them
>>>>> all into a single tar file for each cooked file.
>>>>>
>>>>> However, this means that in order to extract your skim you have to
>>>>> cache much more (all) of the data from the tapes than if you just had
>>>>> a single tar file for a given skim for a given run.  But if you
>>>>> combine things by run, if anything needs to be re-cooked (e.g. a
>>>>> cooking job fails), you have to re-make all of the tar files
>>>>> associated with that run.
>>>>>
>>>>> How do you guys think we should do it?
>>>>>
>>>>> - Paul
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> g13 mailing list
>>>>> g13 at jlab.org
>>>>> https://mailman.jlab.org/mailman/listinfo/g13
>>>>>
>>>>>
>>>>>            
>>>> _______________________________________________
>>>> g13 mailing list
>>>> g13 at jlab.org
>>>> https://mailman.jlab.org/mailman/listinfo/g13
>>>>
>>>>          
>>> --
>>> Dennis Weygand
>>> weygand at jlab.org
>>> (757) 269-5926
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> g13 mailing list
>>> g13 at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/g13
>>>
>>>        
> --
> Dennis Weygand
> weygand at jlab.org
> (757) 269-5926
>
>
>
>