<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>In my previous estimate, of the cache portion, 278 TB, only 105

      TB of that is pinned. The unpinned part is presumeably old files

      that should be gone, but have not been deleted since there happens

      to be no demand for the space. If we use 105 TB as our cache usage

      then re-doing your estimate gives 555 TB, which means in 9 months

      we will have 270 TB of unused space. Which would mean that we have

      room to increase our usage without buying anything!<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 06/07/2017 05:54 PM, Chip Watson

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:542f1fe5-c1ae-864d-a4db-191f059590d8@jlab.org">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Mark,</p>

      <p>I still need you to answer the question of how to further

        reduce usage and how to configure.  Your usage as you report it

        is about 370 TB.  Assuming that Hall B needs the same within 9

        months, and that A+C need half as much, then that leads to a

        total of 925TB which is more than Physics owns, by 100 TB (NOT

        CURRENT USAGE, JUST PROJECTION BASED ON GLUEX USAGE).<br>

      </p>

      <p>There is also the question of how to split the storage budget. 

        In budget, you can have new half a JBOD: 21 disks configured as

        3 RAID z2 stripes of 5+2 disks, 8TB, thus 120 raw data, 108 in a

        file system, and 86 TB at 80% -- for all of GlueX, CLAS-12, A

        and C.  If GlueX is 40% of the total, that makes 35TB, and you

        are still high by 70%.</p>

      <p>The other low cost option is to re-purpose a 2016 Lustre node

        so that /work is twice this size (one full JBOD), and GlueX can

        use 70TB as /work.  But then you must reduce /cache + /volatile

        by a comparable amount since we have to pull a node out of

        production.  And this still isn't free since we'll need a total

        of 4 RAID cards instead of 2 to provide correct performance, and

        we'll need to add SSD's to the mix.<br>

      </p>

      <p>So, in the absence of money (which clearly seems to be the

        case), do you choose (a) reduce your use of work by 1.7x, or (b)

        reduce your use of cache + volatile by 25%.  There is no middle

        case.<br>

      </p>

      <p>thanks,</p>

      <p>Chip<br>

      </p>

      <br>

      <div class="moz-cite-prefix">On 6/7/17 5:30 PM, Mark Ito wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:33a10c5e-d80f-cd7b-6761-04a35a45729d@jlab.org">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <p>Summarizing Hall D work disk usage (/work/halld only):</p>

        <p>o using du, today 2017-06-06, 59 TB<br>

        </p>

        <p>o from our disk-management database, a couple of days ago,

          2017-06-04, 86 TB</p>

        <p>I also know that one of our students got rid of about 20 TB

          of unneeded files yesterday. That accounts for part of the

          drop.<br>

        </p>

        <p>We produce a <a moz-do-not-send="true"

            href="https://halldweb.jlab.org/disk_management/work_report.html">report

            from that database</a> that is updated every few days.</p>

        <p>From the SciComp pages, Hall D is using 287 TB on cache and

          21 TB on volatile.</p>

        <p>My view is that this level of work disk usage is more or less

          as expected, consistent with our previous estimates, and not

          particularly abusive. That having been said, I am sure there

          is a lot that can be cleaned up. But as Ole pointed out, disk

          usage grows naturally and we were not aware that this was a

          problem. I seem to recall that we agreed to respond to emails

          that would be sent when we reached 90% of too much, no? Was

          the email sent out?<br>

        </p>

        <p>One mystery: when I ask Lustre what we are using I get:</p>

        <pre>ifarm1402:marki:marki>   lfs quota -gh halld /lustre</pre>

        <pre>Disk quotas for group halld (gid 267):</pre>

        <pre>     Filesystem    used   quota   limit   grace   files   quota   limit   grace</pre>

        <pre>        /lustre    290T    470T    500T       - 15106047       0       0       -</pre>

        which is less than cache + volatile, not to mention work. I

        thought that to a good approximation this 290 TB should be the

        sum of all three. What am I missing?<br>

        <br>

        <div class="moz-cite-prefix">On 05/31/2017 10:35 AM, Chip Watson

          wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:53cbb78f-3f2c-5fe0-c925-96e90b733bf1@jlab.org">All,

          <br>

          <br>

          As I have started on the procurement of the new /work file

          server, I have discovered that Physics' use of /work has grown

          unrestrained over the last year or two. <br>

          <br>

          "Unrestrained" because there is no way under Lustre to

          restrain it except via a very unfriendly Lustre quota system. 

          As we leave some quota headroom to accommodate large swings in

          usage for each hall for cache and volatile, then /work

          continues to grow. <br>

          <br>

          Total /work has now reached 260 TB, several times larger than

          I was anticipating.  This constitutes more than 25% of

          Physics' share of Lustre, compared to LQCD which uses less

          than 5% of its disk space on the un-managed /work. <br>

          <br>

          It would cost Physics an extra $25K (total $35K - $40K) to

          treat the 260 TB as a requirement. <br>

          <br>

          There are 3 paths forward: <br>

          <br>

          (1) Physics cuts its use of /work by a factor of 4-5. <br>

          (2) Physics increases funding to $40K <br>

          (3) We pull a server out of Lustre, decreasing Physics' share

          of the system, and use that as half of the new active-active

          pair, beefing it up with SSDs and perhaps additional memory;

          this would actually shrink Physics near term costs, but puts

          higher pressure on the file system for the farm <br>

          <br>

          The decision is clearly Physics', but I do need a VERY FAST

          response to this question, as I need to move quickly now for

          LQCD's needs. <br>

          <br>

          Hall D + GlueX,  96 TB <br>

          CLAS + CLAS12, 98 TB <br>

          Hall C,                35 TB <br>

          Hall A <unknown, still scanning> <br>

          <br>

          Email, call (x7101), or drop by today 1:30-3:00 p.m. for

          discussion. <br>

          <br>

          thanks, <br>

          Chip <br>

          <br>

        </blockquote>

        <br>

        <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org" moz-do-not-send="true">marki@jlab.org</a>, (757)269-5295

</pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org">marki@jlab.org</a>, (757)269-5295

</pre>

  </body>

</html>