<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Summarizing Hall D work disk usage (/work/halld only):</p>
<p>o using du, today 2017-06-06, 59 TB<br>
</p>
<p>o from our disk-management database, a couple of days ago,
2017-06-04, 86 TB</p>
<p>I also know that one of our students got rid of about 20 TB of
unneeded files yesterday. That accounts for part of the drop.<br>
</p>
<p>We produce a <a moz-do-not-send="true"
href="https://halldweb.jlab.org/disk_management/work_report.html">report
from that database</a> that is updated every few days.</p>
<p>From the SciComp pages, Hall D is using 287 TB on cache and 21 TB
on volatile.</p>
<p>My view is that this level of work disk usage is more or less as
expected, consistent with our previous estimates, and not
particularly abusive. That having been said, I am sure there is a
lot that can be cleaned up. But as Ole pointed out, disk usage
grows naturally and we were not aware that this was a problem. I
seem to recall that we agreed to respond to emails that would be
sent when we reached 90% of too much, no? Was the email sent out?<br>
</p>
<p>One mystery: when I ask Lustre what we are using I get:</p>
<pre>ifarm1402:marki:marki> lfs quota -gh halld /lustre</pre>
<pre>Disk quotas for group halld (gid 267):</pre>
<pre> Filesystem used quota limit grace files quota limit grace</pre>
<pre> /lustre 290T 470T 500T - 15106047 0 0 -</pre>
which is less than cache + volatile, not to mention work. I thought
that to a good approximation this 290 TB should be the sum of all
three. What am I missing?<br>
<br>
<div class="moz-cite-prefix">On 05/31/2017 10:35 AM, Chip Watson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:53cbb78f-3f2c-5fe0-c925-96e90b733bf1@jlab.org">All,
<br>
<br>
As I have started on the procurement of the new /work file server,
I have discovered that Physics' use of /work has grown
unrestrained over the last year or two.
<br>
<br>
"Unrestrained" because there is no way under Lustre to restrain it
except via a very unfriendly Lustre quota system. As we leave
some quota headroom to accommodate large swings in usage for each
hall for cache and volatile, then /work continues to grow.
<br>
<br>
Total /work has now reached 260 TB, several times larger than I
was anticipating. This constitutes more than 25% of Physics'
share of Lustre, compared to LQCD which uses less than 5% of its
disk space on the un-managed /work.
<br>
<br>
It would cost Physics an extra $25K (total $35K - $40K) to treat
the 260 TB as a requirement.
<br>
<br>
There are 3 paths forward:
<br>
<br>
(1) Physics cuts its use of /work by a factor of 4-5.
<br>
(2) Physics increases funding to $40K
<br>
(3) We pull a server out of Lustre, decreasing Physics' share of
the system, and use that as half of the new active-active pair,
beefing it up with SSDs and perhaps additional memory; this would
actually shrink Physics near term costs, but puts higher pressure
on the file system for the farm
<br>
<br>
The decision is clearly Physics', but I do need a VERY FAST
response to this question, as I need to move quickly now for
LQCD's needs.
<br>
<br>
Hall D + GlueX, 96 TB
<br>
CLAS + CLAS12, 98 TB
<br>
Hall C, 35 TB
<br>
Hall A <unknown, still scanning>
<br>
<br>
Email, call (x7101), or drop by today 1:30-3:00 p.m. for
discussion.
<br>
<br>
thanks,
<br>
Chip
<br>
<br>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org">marki@jlab.org</a>, (757)269-5295
</pre>
</body>
</html>