<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>If any of you are interested in participating, please let me or
David know.<br>
</p>
<div class="moz-forward-container"><br>
<br>
-------- Forwarded Message --------
<table class="moz-email-headers-table" cellpadding="0"
cellspacing="0" border="0">
<tbody>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">Subject:
</th>
<td>Near Term Evolution of JLab Scientific Computing</td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">Date: </th>
<td>Mon, 8 May 2017 13:56:59 -0400</td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">From: </th>
<td>Chip Watson <a class="moz-txt-link-rfc2396E" href="mailto:watson@jlab.org"><watson@jlab.org></a></td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">To: </th>
<td>Mark Ito <a class="moz-txt-link-rfc2396E" href="mailto:marki@jlab.org"><marki@jlab.org></a>, David Lawrence
<a class="moz-txt-link-rfc2396E" href="mailto:davidl@jlab.org"><davidl@jlab.org></a>, Ole Hansen <a class="moz-txt-link-rfc2396E" href="mailto:ole@jlab.org"><ole@jlab.org></a>,
Harut Avakian <a class="moz-txt-link-rfc2396E" href="mailto:avakian@jlab.org"><avakian@jlab.org></a>, Brad Sawatzky
<a class="moz-txt-link-rfc2396E" href="mailto:brads@jlab.org"><brads@jlab.org></a>, Paul Mattione
<a class="moz-txt-link-rfc2396E" href="mailto:pmatt@jlab.org"><pmatt@jlab.org></a>, Markus Diefenthaler
<a class="moz-txt-link-rfc2396E" href="mailto:mdiefent@jlab.org"><mdiefent@jlab.org></a>, Graham Heyes
<a class="moz-txt-link-rfc2396E" href="mailto:heyes@jlab.org"><heyes@jlab.org></a>, Sandy Philpott
<a class="moz-txt-link-rfc2396E" href="mailto:philpott@jlab.org"><philpott@jlab.org></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<p>All,</p>
<p>It was enjoyable to listen to the talks at the "Future Trends
in NP Computing" workshop last week. Some of the more amazing
talks showed what can be done with large investments from
talented people plus ample funding for R&D and deployment.
I particularly enjoyed hearing how machine learning trained the
trigger for an LHC experiment, replacing fast simple trigger
logic with (opaque?) better performing software.<br>
</p>
<p>I would like to turn our thoughts toward the immediate future,
and start a new discussion of the "Near Term Evolution of JLab
Scientific Computing". Some of you on this list are regular
attenders of the Scientific Computing / Physics meetings, and so
are included in this email. Others are people who have been
active in computing here at the lab. This list is NOT carefully
thought out, but does include people from all halls, with an
emphasis on GlueX since I receive more of their emails to help
me guess who might be interested. I invite all of you to
forward this to others whom you think would like to join this
effort.</p>
<p><b>Scope: </b>Figure out the best path forward for JLab
computing for the next 1-2 years, with a particular focus on
meeting the needs of Experimental Physics in FY18.</p>
<p>Our baseline plan has been to satisfy all computing
requirements in-house. This approach is certainly the most cost
effective way to deploy integrated Tflops or SpecIntRate
capacity with good bandwidth to all the data. But it has known
weaknesses: it works best when there is a load that is constant
(mostly) throughout the year. Physicists work best when there
is infinite capacity on-demand. More capacity that someone else
pays for (other than DOE NP) might be available, and already is
for GlueX and their use of OSG resources. Somewhere there may be
an optimum where Physicists perceive reasonable turn around on
real loads, and money is being spent in an effective way.</p>
<p><b>Process:</b></p>
<p>1. put together a knowledgeable and interested group of experts<br>
(in requirements, in technology, etc.)<br>
</p>
<p>2. update hall requirements for FY18, with particular emphasis
on<br>
a. I/O per SpecInt job requirements <br>
(roughly how much bandwidth to support a job of N cores)<br>
b. real, necessary load fluctuations<br>
(i.e. what peaks do we need to support, i.e. worth some
money)</p>
<p>3. roughly evaluate the possible ways to satisfy these
requirements</p>
<p> a. in-house baseline, assuming GlueX can offload work to
the grid</p>
<p> b. pushing work to a DOE center (NERSC, ORNL, ...) for free</p>
<p> c. paying to use a Cloud for important peaks</p>
<p> d. expanding use of OSG to more halls</p>
<p> Each of these approaches has different costs. For example,
as more work is pushed offsite, we will need to upgrade the
lab's network from 10g to 40g (at least). Use of the grid has
more of an impact on users than does in-house or cloud (cloud
infrastructure as a service can appear to be part of our Auger /
PBS system). All solutions which double performance but assume
storage at JLab will require upgrades to local disk and tape
sub-systems, even if the farm remains fixed in size.</p>
<p>4. technology pilots for best looking alternatives</p>
<p>5. software evaluations of software products, or software <br>
developments of in-house products, to make this<br>
possibly more complex ecosystem more transparent<br>
to the user</p>
<p><b>Time Commitment</b></p>
<p>Initially, I would like us to meet weekly with gaps due to
unavailability of too many people. As work becomes better
defined, this can switch to every 2-3 weeks, with people doing
"homework" between meetings and reporting back.</p>
<p>I would like preliminary decisions reached by mid August, so
that if there is any end of year funding available, we can put
forward a proposal for investments (in farm, disk, tape, or wide
area networking). I also see this helping to shape the FY18
budgets for the effected divisions and groups. So, possibly 8
meetings.<br>
</p>
<p>Due to the HUGE uncertainty in budgets for FY18, we will plan
against 2 scenarios with different numbers of weeks of running
(which drives all of JLab's experimental physics computing).</p>
<p>Anticipated evolution of code performance is an important
topic.<br>
</p>
<p><b>Your Next Steps</b></p>
<p>1. Forward to key people omitted from this sparse list (I'm
looking for a total of 7-12 people, including myself, Graham,
and Sandy)<br>
</p>
<p>2. Reply to me to let me know if you would like to be a
participant (everyone will get a report in late summer).</p>
<p>3. Help me in setting an initial meeting date: reply with what
hours you could be available on each of the following dates:<br>
</p>
<p> Thursday May 11</p>
<p> Thursday May 18</p>
<p> Monday May 22<br>
</p>
<p> Tuesday May 23</p>
<p> Wednesday May 24</p>
<p>I will get back to each of you who says "yes" as soon as it is
clear when we can reach a decent quorum. Remote participation
can be supported for working team members (no "listeners only"
please).</p>
<p>regards,</p>
<p>Chip</p>
<p><br>
</p>
</div>
</body>
</html>