[Halld-offline] cleaning small files from the cache disk

Mark Ito marki at jlab.org
Tue Jun 7 12:17:55 EDT 2016


Folks,

Some time in the near-ish future the computer center will modify their 
algorithm to write all files on the cache disk to tape. Since for years 
we have been in the habit of having a cache disk whose files originate 
from tape, this does not represent a change in our long-standing habits. 
On the other hand, since the write-through feature was introduced, there 
are a significant number of written-by-user-to-disk cache files that we 
might not want to show up on tape. And for large collections of small 
files, even if we do want them on tape, writing them one file at a time 
is inefficient. In the latter case it is better to collect them in a 
large container file (tar them up).

Before the change comes we should try to do a clean-up, as we discussed 
at the last offline meeting (https://goo.gl/kIzp9r). At this writing 
(pre-change) files of less than 1 MB are not written to tape at all. I 
would like to see us review our usage of small files, defined here as 
those less than 1 MB, and either tar them up and leave the tar files on 
the cache disk for archive to tape, or move them to the work disk.

Right now Hall D has 300,000 small files (under /cache/halld), although 
some of them are surely already on tape and therefore do not present an 
additional tape-use problem. The command I used to do the count was:

   lfs find --type f --size -1000000 /cache/halld | wc -l

The 1 MB number is clearly a soft limit. If you have a few 10 kB files 
that need to be written to tape, that is not a problem.  If you have 
thousands of 1.5 MB files out there, you might want to consider tar'ing 
them up.

   -- Mark

-- 
marki at jlab.org, (757)269-5295



More information about the Halld-offline mailing list