[Hallcsw] GLUEX Farm allocation

Ole Hansen ole at jlab.org
Wed Mar 9 14:19:59 EST 2016


Given that LQCD owes ENP over 5M CPU hours as a result of the "dynamic
load balancing" Chip has pushed so hard, it's not immediately obvious
why ENP should buy any new hardware with FY16 funds. Instead, why aren't
the temporary increased demands from Hall D met with LQCD resources, as
was the advertised advantage of the load balancing? Chip even indicated
at the last SciComp meeting that now would be an excellent time for ENP
to put more load on the farm to use some of that CPU hour credit.

To get a 10% increase of your farm allocation (from 50% to 60%), you
need about 500 extra cores. Even if you borrow 1000 cores from LQCD, the
5M hours correspond to 208 days of 24/7 processing. I see absolutely no
need to buy new hardware at this time. That is, unless LQCD becomes a
deadbeat on that core hour loan ...

What am I missing?

Ole

On 03/09/2016 02:05 PM, Heyes Graham wrote:
> Dear Curtis,
> I have put all of this in an email and rolled the ball back to Rolf. I
> am recommending that you get 60% then drop back to 50% at the end of the
> run. Based on the updated computing requirements from all of the halls I
> (and Chip) am recommending that we also buy approximately 1000 cores (or
> whatever Rolf can afford, which may be zero) using FY16 funds. That
> boost should appear in summer. When that happens we will renormalize the
> GLUEX allocation in an upward direction.
> 
> Regards,
> Graham
> 
> 
>> On Mar 9, 2016, at 1:55 pm, Curtis A. Meyer <cmeyer at cmu.edu
>> <mailto:cmeyer at cmu.edu>> wrote:
>>
>> Dear Graham 
>>
>>   we had a bit of a discussion on this at the start of our calibration
>> meeting today. Both Eugene 
>> and I felt it was pretty critical during this run that we get as fast
>> of turn around as possible on
>> many of our jobs, and both felt that for the next month or so (till
>> the end of the run), the 60%
>> was a better number than 50%, but that after that, we should fall back
>> to a level that the lab
>> feels is more reasonable. That said, I think that Rolf’s numbers are
>> reasonable as an average,
>> but I would push for a bit more at the moment — 60%.
>>
>> Curtis
>>
>>
>> ---------
>> Curtis A. MeyerMCS Associate Dean for Faculty and Graduate Affairs
>> Wean:    (412) 268-2745Professor of Physics
>> Doherty: (412) 268-3090Carnegie Mellon University
>> Fax:         (412) 681-0648Pittsburgh, PA 15213
>> curtis.meyer at cmu.edu
>> <mailto:curtis.meyer at cmu.edu>http://www.curtismeyer.com/
>>
>>
>>
>>> On Mar 9, 2016, at 1:31 PM, Heyes Graham <heyes at jlab.org
>>> <mailto:heyes at jlab.org>> wrote:
>>>
>>> Dear Curtis and Eugene,
>>> after hearing about the farm resource allocation issue (yes we should
>>> have thought about this before the run started) I made a hand waving
>>> argument based on the numbers given to me for the review last year
>>> that the true GLUEX allocation should be 60% of the nodes currently
>>> available. This includes ENP and some LQCD systems.
>>>
>>> I ran these numbers past Rolf and his take is that GLUEX should have
>>> at minimum 50%. He proposed for the next year until CLAS12 comes in
>>> "A and C together 20%, B 25-30%, D 50-55%, or something like that.”,
>>> his words not mine. I would go at the high end of that range, 55% for
>>> GLUEX, remembering that we will have to renormalize when we buy more
>>> machines and your percentage will increase then. This may happen in
>>> late summer but certainly in the fall. You should also know that
>>> currently we are borrowing 1000 cores off LQCD to bump our existing
>>> farm from 3400 to 4400 cores.  That loan is a repayment of a loan of
>>> our nodes made over the last twelve months and will not last forever.
>>>
>>> Rolf asked for input from Patrizia and Javier. Patrizia agrees with
>>> the 55% number.
>>>
>>> I spoke to Mark a few minutes ago and he told me he had spoken to
>>> both of you about the 60% number. I could ask for 60% until the end
>>> of the run and drop back to 50% after until we get new nodes. Of
>>> course this also adds pressure to the drive to get funding for more
>>> farm nodes. What is your feeling?
>>>
>>> Regards,
>>> Graham
>>>
>>
> 
> 
> 
> _______________________________________________
> Hallcsw mailing list
> Hallcsw at jlab.org
> https://mailman.jlab.org/mailman/listinfo/hallcsw
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ole.vcf
Type: text/x-vcard
Size: 321 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/hallcsw/attachments/20160309/b970b32b/attachment.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jlab.org/pipermail/hallcsw/attachments/20160309/b970b32b/attachment.sig>


More information about the Hallcsw mailing list