[Halld-offline] ENP consumption of disk space under /work
Mark Ito
marki at jlab.org
Wed Jun 7 17:30:05 EDT 2017
Summarizing Hall D work disk usage (/work/halld only):
o using du, today 2017-06-06, 59 TB
o from our disk-management database, a couple of days ago, 2017-06-04, 86 TB
I also know that one of our students got rid of about 20 TB of unneeded
files yesterday. That accounts for part of the drop.
We produce a report from that database
<https://halldweb.jlab.org/disk_management/work_report.html> that is
updated every few days.
From the SciComp pages, Hall D is using 287 TB on cache and 21 TB on
volatile.
My view is that this level of work disk usage is more or less as
expected, consistent with our previous estimates, and not particularly
abusive. That having been said, I am sure there is a lot that can be
cleaned up. But as Ole pointed out, disk usage grows naturally and we
were not aware that this was a problem. I seem to recall that we agreed
to respond to emails that would be sent when we reached 90% of too much,
no? Was the email sent out?
One mystery: when I ask Lustre what we are using I get:
ifarm1402:marki:marki> lfs quota -gh halld /lustre
Disk quotas for group halld (gid 267):
Filesystem used quota limit grace files quota limit grace
/lustre 290T 470T 500T - 15106047 0 0 -
which is less than cache + volatile, not to mention work. I thought that
to a good approximation this 290 TB should be the sum of all three. What
am I missing?
On 05/31/2017 10:35 AM, Chip Watson wrote:
> All,
>
> As I have started on the procurement of the new /work file server, I
> have discovered that Physics' use of /work has grown unrestrained over
> the last year or two.
>
> "Unrestrained" because there is no way under Lustre to restrain it
> except via a very unfriendly Lustre quota system. As we leave some
> quota headroom to accommodate large swings in usage for each hall for
> cache and volatile, then /work continues to grow.
>
> Total /work has now reached 260 TB, several times larger than I was
> anticipating. This constitutes more than 25% of Physics' share of
> Lustre, compared to LQCD which uses less than 5% of its disk space on
> the un-managed /work.
>
> It would cost Physics an extra $25K (total $35K - $40K) to treat the
> 260 TB as a requirement.
>
> There are 3 paths forward:
>
> (1) Physics cuts its use of /work by a factor of 4-5.
> (2) Physics increases funding to $40K
> (3) We pull a server out of Lustre, decreasing Physics' share of the
> system, and use that as half of the new active-active pair, beefing it
> up with SSDs and perhaps additional memory; this would actually shrink
> Physics near term costs, but puts higher pressure on the file system
> for the farm
>
> The decision is clearly Physics', but I do need a VERY FAST response
> to this question, as I need to move quickly now for LQCD's needs.
>
> Hall D + GlueX, 96 TB
> CLAS + CLAS12, 98 TB
> Hall C, 35 TB
> Hall A <unknown, still scanning>
>
> Email, call (x7101), or drop by today 1:30-3:00 p.m. for discussion.
>
> thanks,
> Chip
>
--
Mark Ito, marki at jlab.org, (757)269-5295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170607/c95faff8/attachment-0001.html>
More information about the Halld-offline
mailing list