<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>from Graham Heyes, explaining the recent history of disk use and

      disk procurements...<br>

    </p>

    <div class="moz-forward-container"><br>

      <br>

      -------- Forwarded Message --------

      <table class="moz-email-headers-table" border="0" cellspacing="0"

        cellpadding="0">

        <tbody>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">Subject:

            </th>

            <td>FYI disk fileservers</td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">Date: </th>

            <td>Thu, 2 Nov 2017 12:53:04 -0400</td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">From: </th>

            <td>Graham Heyes <a class="moz-txt-link-rfc2396E" href="mailto:heyes@jlab.org"><heyes@jlab.org></a></td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">To: </th>

            <td>Mark Ito <a class="moz-txt-link-rfc2396E" href="mailto:marki@jlab.org"><marki@jlab.org></a>, Ole Hansen

              <a class="moz-txt-link-rfc2396E" href="mailto:ole@jlab.org"><ole@jlab.org></a>, Brad Sawatzky

              <a class="moz-txt-link-rfc2396E" href="mailto:brads@jlab.org"><brads@jlab.org></a>, Harut Avakian

              <a class="moz-txt-link-rfc2396E" href="mailto:avakian@jlab.org"><avakian@jlab.org></a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      Someone outside yesterday's meeting had a question about disk

      space so I wrote down the story so far. I thought it was a useful

      enough summary that I would send it to you in case you would like

      to share it with your people too.

      <div class=""><br class="">

      </div>

      <div class="">

        <div class="" style="font-family: AvenirNext-Regular;">Last year

          we identified the “work” filesystem as an area in need of

          improvement. By it’s very nature a “work” filesystem can

          contain a large number of small files that churn as people run

          jobs, compile code, etc. Unfortunately the Luster based system

          is not very efficient in this mode. What users see as the

          “work” filesystem is a virtual space carved out of a larger

          filesystem that also provides /cache and /volitile for the

          farm and space paid for by LQCD. The non-work areas are

          machine managed using algorithms that automatically free space

          by migrating old or little used files to tape. Conversely

          /work is “human managed” using quotas. A problem with this is

          that growth in the work area is at the expense of /cache and

          /volatile which gradually shrink. Also accessing many small

          files simultaneously in /work impacts performance for /cache,

          /volatile and the LQCD users.</div>

        <div class="" style="font-family: AvenirNext-Regular;"><br

            class="">

        </div>

        <div class="" style="font-family: AvenirNext-Regular;">The

          solution to the problem was to buy a new file server

          specifically designed for use as “work”. Due to funding

          constraints in FY17 we bought a system that was not fully

          populated with drives and controllers with the plan to expand

          at a later date. This minimal system was designed to meet the

          needs of FY18 as understood at the time. Halls A/C offered to

          add funds of their own to aid the procurement in return for a

          “larger slice of the pie”. Normally I buy disk out of my

          budget but I was glad of the one time help. I spoke with Rolf

          and we agreed that, for many reasons, we do not want the halls

          buying their own disk to add to the farm. The model is that

          the halls provide requirements and Rolf pays, via me, for IT

          to procure something to meet the requirements. One reason for

          this is that typically we add disk space in large chunks to

          keep the cost per terabyte down so we don’t want to buy

          piecemeal.</div>

        <div class="" style="font-family: AvenirNext-Regular;"><br

            class="">

        </div>

        <div class="" style="font-family: AvenirNext-Regular;">Here is a

          summary of where we are:</div>

        <div class="" style="font-family: AvenirNext-Regular;"><br

            class="">

        </div>

        <blockquote class="" style="font-family: AvenirNext-Regular;

          margin: 0px 0px 0px 40px; border: none; padding: 0px;">

          <div class="">Currently on Luster: work=170TB, cache=400 TB,

            Volatile=165 TB, for a total of 735 TB </div>

          <div class=""><br class="">

          </div>

          <div class="">Note that ENP only bought 690 TB and the 45 TB

            difference is “on loan" from LQCD.</div>

          <div class=""><br class="">

          </div>

          <div class="">The FY17 procurement is a high performance

            server optimized to be /work with 144 TB useable.</div>

          <div class="">We asked all the halls to temporarily reduce

            their use of /work to fit in the new server. After some

            pushback we are allowing some rarely used data to stay on

            Luster and are commencing the move to the new server.</div>

          <div class=""><br class="">

          </div>

          <div class="">The FY18 procurement which will be installed in

            January adds 216 TB.</div>

          <div class=""><br class="">

          </div>

          <div class="">So, after January we will have 360 TB of high

            performance /work on the new server compared with 170 TB of

            /work now under Luster. So double the space and higher

            performance hardware. </div>

        </blockquote>

        <div class="" style="font-family: AvenirNext-Regular;"><br

            class="">

        </div>

        <div class="" style="font-family: AvenirNext-Regular;">The space

          freed up on Luster will be added to /cache and /volatile. As

          data processing ramps up, whether on or off site, we will need

          more /cache to stage the data being processed. Based on

          requirement projections I expect this to exceed what we own so

          I have asked Chip to prepare another REQ to add 250 TB to the

          /cache and /volatile which will expand it to almost 1PB. </div>

        <div class="" style="font-family: AvenirNext-Regular;"><br

            class="">

        </div>

        <div class="" style="font-family: AvenirNext-Regular;">I am

          looking closely at the disk requirements that we had in the

          last computing review and am asking everyone to update them in

          light of additional experience in recent months. In particular

          to be sure that we do need to add the 250 TB of cache and if

          we do that will it be enough for the long term or will we need

          more.</div>

      </div>

      <div class="" style="font-family: AvenirNext-Regular;"><br

          class="">

      </div>

      <div class="" style="font-family: AvenirNext-Regular;">I hope that

        this is useful!</div>

      <div class="" style="font-family: AvenirNext-Regular;"><br

          class="">

      </div>

      <div class="" style="font-family: AvenirNext-Regular;"><span class="Apple-tab-span" style="white-space:pre">  </span>Regards,</div>

      <div class="" style="font-family: AvenirNext-Regular;"><span class="Apple-tab-span" style="white-space:pre">          </span>Graham</div>

      <div class="" style="font-family: AvenirNext-Regular;"><br

          class="">

      </div>

      <div class=""><br class="">

      </div>

    </div>

  </body>

</html>