<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Chip,<br>

    </p>

    <p>Indeed we do have to choose! You are quite right. Here are my

      notes and conclusion.<br>

    </p>

    <p>On 5/31 the choices were:<br>

    </p>

    <pre>(1) Physics cuts its use of /work by a factor of 4-5.

</pre>

    <pre>(2) Physics increases funding to $40K

</pre>

    <pre>(3) We pull a server out of Lustre, decreasing Physics' share of the 

system, and use that as half of the new active-active pair, beefing it 

up with SSDs and perhaps additional memory; this would actually shrink 

Physics near term costs, but puts higher pressure on the file system for 

the farm </pre>

    <p>On 6/1:</p>

    <pre>    Option 3: shrink /work 2x (to 172 TB), freeze cache & volatile, ~$5K

                    (performance hit on the farm)

                    new system has 2 JBODS, 2 controllers per head

                    might need to temporarily shrink space by 190 TB

                    on top of the 2x to drain a node to re-purpose it

                    if LQCD needs its disk space back

    Option 1: shrink /work 4x and increase cache and volatile by 25%  

~$15K

                    (Chip's recommendation)

                    new system has 1 JBOD and can be expanded in the future

    Option 2: /work as is (340 TB), increase cache & volatile by 25%, 

~$40K

                    (over budget)

                     needs 3 JBODs for the new system, 2 cascaded for ENP 

</pre>

    <br>

    and also on 6/1:<br>

    <pre>We have NO intention of keeping /work on Lustre.</pre>

    <p>On 6/7:</p>

    <pre>So, in the absence of money (which clearly seems to be the case),

      do you choose (a) reduce your use of work by 1.7x, or (b) reduce

      your use of cache + volatile by 25%.  There is no middle case.</pre>

    <p>And today, 6/8: <br>

    </p>

    <pre>I'll offer one more option: we can use 10TB drives instead of

      8TB.</pre>

    <p>and also today:<br>

    </p>

    <pre>Default choice if no one else votes is now this: new /work server

      is procured with 44 drives, 10 TB HGST He10 enterprise drives (top

      rated, new big brother to our He8 drives).</pre>

    <p>So I am confused....</p>

    <p>...but I think I like this last default choice. Let's see if I

      understand it:</p>

    <p>a) /work _not_ on Lustre<br>

    </p>

    <p>b) 44 TB total /work for Hall D</p>

    <p>c) Hall D /cache and /volatile about the same as now or they grow

      a bit<br>

    </p>

    <p>d) we can afford this with this year's money<br>

    </p>

    <p>If I have that right[?!], then the default is fine. SSD's would

      be nice of course, but I don't think they are critical since we

      are compute bound. The big advance for us would be a more reliable

      work disk at the cost of slightly less of it.</p>

    <p>And don't get me wrong, I do appreciate the effort in coming up

      with a solution. <span class="moz-smiley-s1"><span>:-)</span></span><br>

    </p>

      -- Mark<br>

    <br>

    <div class="moz-cite-prefix">On 06/08/2017 11:42 AM, Chip Watson

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:9cfdc073-d1bd-258c-96cb-478592dcb969@jlab.org">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Mark,</p>

      <p>Sorry, you don't get off that easy :P.  You have to choose, and

        not choosing means you choose what follows.<br>

      </p>

      <p>I'll offer one more option: we can use 10TB drives instead of

        8TB.  Probably slightly longer rebuild times, but actually

        higher streaming bandwidth.  Cost will go up, maybe a few $K.  I

        think we can find a way to squeeze that out of the system.<br>

      </p>

      <p>Default choice if no one else votes is now this: new /work

        server is procured with 44 drives, 10 TB HGST He10 enterprise

        drives (top rated, new big brother to our He8 drives).  If

        Physics is poor, we can defer buying them the read cache SSD to

        compensate for the higher price of 10TB (Graham, can you swing

        an extra $3K?).<br>

      </p>

      <p>        21 drives for ENP, 21 for LQCD, 2 hot spares in a 44

        drive enclosure (SSDs in hosts)<br>

      </p>

      <p>        ENP and LQCD each get total quota (enforced) of 108 TB</p>

      <p>        GlueX, CLAS-12 each get 44 TB, A and C get 10 TB

        (enforced)<br>

      </p>

      <p>CLAS-12 says they can live within this, and C believes they are

        already there.  GlueX and A will need to figure out what can be

        moved into /volatile.<br>

      </p>

      <p>If requested, we can give you modest lifetime "pin" for

        /volatile so some large data files that should not go to tape

        can have longer lifetimes in Lustre /volatile than they can

        today.</p>

      <p>Next year, for about $10K we can add a second cascaded JBOD

        next year with one more RAID stripe for ENP.  We'll also need to

        spend $25K to replace the Lustre storage that ages out in FY18

        (or suffer the reduction).<br>

      </p>

      <p>regards,        <br>

      </p>

      <p>Chip<br>

      </p>

      <br>

      <div class="moz-cite-prefix">On 6/8/17 9:40 AM, Mark Ito wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:2c0abdf3-94b8-733d-77c0-873535e48349@jlab.org">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <p>That was just an observation. I would not interpret it as a

          vote for anything... <span class="moz-smiley-s1"><span>:-)</span></span><br>

        </p>

        <br>

        <div class="moz-cite-prefix">On 06/08/2017 09:33 AM, Chip Watson

          wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:64d33ed4-467e-9be4-5535-d6df22845ff8@jlab.org">

          <meta http-equiv="Content-Type" content="text/html;

            charset=utf-8">

          <p>I'll take this as a vote by GlueX to have more work and

            reduce cache.</p>

          <p>Do A,B,C concur?<br>

          </p>

          <br>

          <div class="moz-cite-prefix">On 6/8/17 9:28 AM, Mark Ito

            wrote:<br>

          </div>

          <blockquote type="cite"

            cite="mid:86bf5471-d966-acec-a420-14a5e0ced17d@jlab.org">

            <meta http-equiv="Content-Type" content="text/html;

              charset=utf-8">

            <p>In my previous estimate, of the cache portion, 278 TB,

              only 105 TB of that is pinned. The unpinned part is

              presumeably old files that should be gone, but have not

              been deleted since there happens to be no demand for the

              space. If we use 105 TB as our cache usage then re-doing

              your estimate gives 555 TB, which means in 9 months we

              will have 270 TB of unused space. Which would mean that we

              have room to increase our usage without buying anything!<br>

            </p>

            <br>

            <div class="moz-cite-prefix">On 06/07/2017 05:54 PM, Chip

              Watson wrote:<br>

            </div>

            <blockquote type="cite"

              cite="mid:542f1fe5-c1ae-864d-a4db-191f059590d8@jlab.org">

              <meta http-equiv="Content-Type" content="text/html;

                charset=utf-8">

              <p>Mark,</p>

              <p>I still need you to answer the question of how to

                further reduce usage and how to configure.  Your usage

                as you report it is about 370 TB.  Assuming that Hall B

                needs the same within 9 months, and that A+C need half

                as much, then that leads to a total of 925TB which is

                more than Physics owns, by 100 TB (NOT CURRENT USAGE,

                JUST PROJECTION BASED ON GLUEX USAGE).<br>

              </p>

              <p>There is also the question of how to split the storage

                budget.  In budget, you can have new half a JBOD: 21

                disks configured as 3 RAID z2 stripes of 5+2 disks, 8TB,

                thus 120 raw data, 108 in a file system, and 86 TB at

                80% -- for all of GlueX, CLAS-12, A and C.  If GlueX is

                40% of the total, that makes 35TB, and you are still

                high by 70%.</p>

              <p>The other low cost option is to re-purpose a 2016

                Lustre node so that /work is twice this size (one full

                JBOD), and GlueX can use 70TB as /work.  But then you

                must reduce /cache + /volatile by a comparable amount

                since we have to pull a node out of production.  And

                this still isn't free since we'll need a total of 4 RAID

                cards instead of 2 to provide correct performance, and

                we'll need to add SSD's to the mix.<br>

              </p>

              <p>So, in the absence of money (which clearly seems to be

                the case), do you choose (a) reduce your use of work by

                1.7x, or (b) reduce your use of cache + volatile by

                25%.  There is no middle case.<br>

              </p>

              <p>thanks,</p>

              <p>Chip<br>

              </p>

              <br>

              <div class="moz-cite-prefix">On 6/7/17 5:30 PM, Mark Ito

                wrote:<br>

              </div>

              <blockquote type="cite"

                cite="mid:33a10c5e-d80f-cd7b-6761-04a35a45729d@jlab.org">

                <meta http-equiv="Content-Type" content="text/html;

                  charset=utf-8">

                <p>Summarizing Hall D work disk usage (/work/halld

                  only):</p>

                <p>o using du, today 2017-06-06, 59 TB<br>

                </p>

                <p>o from our disk-management database, a couple of days

                  ago, 2017-06-04, 86 TB</p>

                <p>I also know that one of our students got rid of about

                  20 TB of unneeded files yesterday. That accounts for

                  part of the drop.<br>

                </p>

                <p>We produce a <a moz-do-not-send="true"

                    href="https://halldweb.jlab.org/disk_management/work_report.html">report

                    from that database</a> that is updated every few

                  days.</p>

                <p>From the SciComp pages, Hall D is using 287 TB on

                  cache and 21 TB on volatile.</p>

                <p>My view is that this level of work disk usage is more

                  or less as expected, consistent with our previous

                  estimates, and not particularly abusive. That having

                  been said, I am sure there is a lot that can be

                  cleaned up. But as Ole pointed out, disk usage grows

                  naturally and we were not aware that this was a

                  problem. I seem to recall that we agreed to respond to

                  emails that would be sent when we reached 90% of too

                  much, no? Was the email sent out?<br>

                </p>

                <p>One mystery: when I ask Lustre what we are using I

                  get:</p>

                <pre>ifarm1402:marki:marki>   lfs quota -gh halld /lustre</pre>

                <pre>Disk quotas for group halld (gid 267):</pre>

                <pre>     Filesystem    used   quota   limit   grace   files   quota   limit   grace</pre>

                <pre>        /lustre    290T    470T    500T       - 15106047       0       0       -</pre>

                which is less than cache + volatile, not to mention

                work. I thought that to a good approximation this 290 TB

                should be the sum of all three. What am I missing?<br>

                <br>

                <div class="moz-cite-prefix">On 05/31/2017 10:35 AM,

                  Chip Watson wrote:<br>

                </div>

                <blockquote type="cite"

                  cite="mid:53cbb78f-3f2c-5fe0-c925-96e90b733bf1@jlab.org">All,

                  <br>

                  <br>

                  As I have started on the procurement of the new /work

                  file server, I have discovered that Physics' use of

                  /work has grown unrestrained over the last year or

                  two. <br>

                  <br>

                  "Unrestrained" because there is no way under Lustre to

                  restrain it except via a very unfriendly Lustre quota

                  system.  As we leave some quota headroom to

                  accommodate large swings in usage for each hall for

                  cache and volatile, then /work continues to grow. <br>

                  <br>

                  Total /work has now reached 260 TB, several times

                  larger than I was anticipating.  This constitutes more

                  than 25% of Physics' share of Lustre, compared to LQCD

                  which uses less than 5% of its disk space on the

                  un-managed /work. <br>

                  <br>

                  It would cost Physics an extra $25K (total $35K -

                  $40K) to treat the 260 TB as a requirement. <br>

                  <br>

                  There are 3 paths forward: <br>

                  <br>

                  (1) Physics cuts its use of /work by a factor of 4-5.

                  <br>

                  (2) Physics increases funding to $40K <br>

                  (3) We pull a server out of Lustre, decreasing

                  Physics' share of the system, and use that as half of

                  the new active-active pair, beefing it up with SSDs

                  and perhaps additional memory; this would actually

                  shrink Physics near term costs, but puts higher

                  pressure on the file system for the farm <br>

                  <br>

                  The decision is clearly Physics', but I do need a VERY

                  FAST response to this question, as I need to move

                  quickly now for LQCD's needs. <br>

                  <br>

                  Hall D + GlueX,  96 TB <br>

                  CLAS + CLAS12, 98 TB <br>

                  Hall C,                35 TB <br>

                  Hall A <unknown, still scanning> <br>

                  <br>

                  Email, call (x7101), or drop by today 1:30-3:00 p.m.

                  for discussion. <br>

                  <br>

                  thanks, <br>

                  Chip <br>

                  <br>

                </blockquote>

                <br>

                <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org" moz-do-not-send="true">marki@jlab.org</a>, (757)269-5295

</pre>

              </blockquote>

              <br>

            </blockquote>

            <br>

            <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org" moz-do-not-send="true">marki@jlab.org</a>, (757)269-5295

</pre>

          </blockquote>

          <br>

        </blockquote>

        <br>

        <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org" moz-do-not-send="true">marki@jlab.org</a>, (757)269-5295

</pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Mark Ito, <a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org">marki@jlab.org</a>, (757)269-5295

</pre>

  </body>

</html>