[g13] tarring skim files

Pawel Nadel-Turonski turonski at jlab.org
Thu Jul 22 17:08:58 EDT 2010


Paul, Slava,

The main reason for taring is that the minimum space allocation on a 
tape is very large, and the extraction time from the time from the tape 
depends very little on the size of the file. Having fewer larger files 
is thus much more efficient. Paul's points about what packaging is 
better for analysis and cooking respectively are also very good. 
However,the idea of reorganizing the tarring at the end of cooking would 
greatly benefit from having the tarred files written to a different 
physical tape than the untarred ones. Perhaps we can check with Sandy 
tomorrow if this is possible.

     Pawel

On 7/22/2010 4:20 PM, Paul Mattione wrote:
> I'm not quite sure what you mean, but I believe one of the reasons why
> we tar things together is because we had too many jput commands
> otherwise and the tape silo could not keep up (or something to that
> effect, someone please correct me if I'm wrong).
>
>    - Paul
>
>
> On Jul 22, 2010, at 4:17 PM, slava at jlab.org wrote:
>
> jput to proper directories. Is there anything else to the aforementioned
> combining?
>
> Slava
>
>    
>> Each cooking job cooks a single raw file, and the resulting cooked
>> file is then skimmed.  This produces a lot of output files for the
>> single job (skims, dsts, ntuples, anahists, gflux, etc).  It's easier
>> to just to group all of these files together since they're all
>> produced by the same job, than to try to combine things across 50+
>> jobs.
>>
>>   - Paul
>>
>> On Jul 22, 2010, at 3:48 PM, slava at jlab.org wrote:
>>
>> You lost me at "From the standpoint...". Why is it easiest?
>>
>> Slava
>>
>>      
>>>  From the standpoint of cooking, I think the easiest way of writing
>>> our skims, dsts, monitoring histograms, etc to tape is to combine
>>> them
>>> all into a single tar file for each cooked file.
>>>
>>> However, this means that in order to extract your skim you have to
>>> cache much more (all) of the data from the tapes than if you just had
>>> a single tar file for a given skim for a given run.  But if you
>>> combine things by run, if anything needs to be re-cooked (e.g. a
>>> cooking job fails), you have to re-make all of the tar files
>>> associated with that run.
>>>
>>> How do you guys think we should do it?
>>>
>>> - Paul
>>>
>>>
>>> _______________________________________________
>>> g13 mailing list
>>> g13 at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/g13
>>>
>>>        
>>
>>      
>
> _______________________________________________
> g13 mailing list
> g13 at jlab.org
> https://mailman.jlab.org/mailman/listinfo/g13
>    


More information about the g13 mailing list