[Halld-offline] ENP consumption of disk space under /work

Mark Ito marki at jlab.org
Wed Jun 7 17:30:05 EDT 2017


Summarizing Hall D work disk usage (/work/halld only):

o using du, today 2017-06-06, 59 TB

o from our disk-management database, a couple of days ago, 2017-06-04, 86 TB

I also know that one of our students got rid of about 20 TB of unneeded 
files yesterday. That accounts for part of the drop.

We produce a report from that database 
<https://halldweb.jlab.org/disk_management/work_report.html> that is 
updated every few days.

 From the SciComp pages, Hall D is using 287 TB on cache and 21 TB on 
volatile.

My view is that this level of work disk usage is more or less as 
expected, consistent with our previous estimates, and not particularly 
abusive. That having been said, I am sure there is a lot that can be 
cleaned up. But as Ole pointed out, disk usage grows naturally and we 
were not aware that this was a problem. I seem to recall that we agreed 
to respond to emails that would be sent when we reached 90% of too much, 
no? Was the email sent out?

One mystery: when I ask Lustre what we are using I get:

ifarm1402:marki:marki>   lfs quota -gh halld /lustre

Disk quotas for group halld (gid 267):

      Filesystem    used   quota   limit   grace   files   quota   limit   grace

         /lustre    290T    470T    500T       - 15106047       0       0       -

which is less than cache + volatile, not to mention work. I thought that 
to a good approximation this 290 TB should be the sum of all three. What 
am I missing?

On 05/31/2017 10:35 AM, Chip Watson wrote:
> All,
>
> As I have started on the procurement of the new /work file server, I 
> have discovered that Physics' use of /work has grown unrestrained over 
> the last year or two.
>
> "Unrestrained" because there is no way under Lustre to restrain it 
> except via a very unfriendly Lustre quota system.  As we leave some 
> quota headroom to accommodate large swings in usage for each hall for 
> cache and volatile, then /work continues to grow.
>
> Total /work has now reached 260 TB, several times larger than I was 
> anticipating.  This constitutes more than 25% of Physics' share of 
> Lustre, compared to LQCD which uses less than 5% of its disk space on 
> the un-managed /work.
>
> It would cost Physics an extra $25K (total $35K - $40K) to treat the 
> 260 TB as a requirement.
>
> There are 3 paths forward:
>
> (1) Physics cuts its use of /work by a factor of 4-5.
> (2) Physics increases funding to $40K
> (3) We pull a server out of Lustre, decreasing Physics' share of the 
> system, and use that as half of the new active-active pair, beefing it 
> up with SSDs and perhaps additional memory; this would actually shrink 
> Physics near term costs, but puts higher pressure on the file system 
> for the farm
>
> The decision is clearly Physics', but I do need a VERY FAST response 
> to this question, as I need to move quickly now for LQCD's needs.
>
> Hall D + GlueX,  96 TB
> CLAS + CLAS12, 98 TB
> Hall C,                35 TB
> Hall A <unknown, still scanning>
>
> Email, call (x7101), or drop by today 1:30-3:00 p.m. for discussion.
>
> thanks,
> Chip
>

-- 
Mark Ito, marki at jlab.org, (757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170607/c95faff8/attachment-0001.html>


More information about the Halld-offline mailing list