[Halld-offline] cleaning small files from the cache disk
Mark Ito
marki at jlab.org
Tue Jun 7 12:17:55 EDT 2016
Folks,
Some time in the near-ish future the computer center will modify their
algorithm to write all files on the cache disk to tape. Since for years
we have been in the habit of having a cache disk whose files originate
from tape, this does not represent a change in our long-standing habits.
On the other hand, since the write-through feature was introduced, there
are a significant number of written-by-user-to-disk cache files that we
might not want to show up on tape. And for large collections of small
files, even if we do want them on tape, writing them one file at a time
is inefficient. In the latter case it is better to collect them in a
large container file (tar them up).
Before the change comes we should try to do a clean-up, as we discussed
at the last offline meeting (https://goo.gl/kIzp9r). At this writing
(pre-change) files of less than 1 MB are not written to tape at all. I
would like to see us review our usage of small files, defined here as
those less than 1 MB, and either tar them up and leave the tar files on
the cache disk for archive to tape, or move them to the work disk.
Right now Hall D has 300,000 small files (under /cache/halld), although
some of them are surely already on tape and therefore do not present an
additional tape-use problem. The command I used to do the count was:
lfs find --type f --size -1000000 /cache/halld | wc -l
The 1 MB number is clearly a soft limit. If you have a few 10 kB files
that need to be written to tape, that is not a problem. If you have
thousands of 1.5 MB files out there, you might want to consider tar'ing
them up.
-- Mark
--
marki at jlab.org, (757)269-5295
More information about the Halld-offline
mailing list