[Halld-offline] expected bandwidth offsite

Tue May 17 11:28:23 EDT 2016

Matt,

Our provisioning is lean and balanced, meaning that we have provisioned 
tape and disk bandwidth based upon the size of the farm, now 3K+ cores, 
moving this summer to 6K-7k cores (2014 generation cores, older cores 
are as much as 2x slower).  Another upgrade comes in the fall to reach 
~8K-10K cores (same units).  Upgrades are based upon the halls' long 
range plans, and do come with upgrades in disk and tape bandwidth to 
keep the system balanced.

"Free" cycles (i.e. not funded by DOE) would require some infrastructure 
provisioning to make sure that these free cycles don't slow down our 
other systems (i.e. make sure that there is a real gain).  Small 
contributions can be absorbed as being in the noise, but larger 
contributions will require that we do some type of QOS so that we don't 
starve local resources.  (Example, these recent file transfers of many 
terabytes did need to tie up a noticeable portion of GlueX's /cache or 
/volatile quotas.)

All this to say, you can't really double the performance of the farm for 
I/O intensive activities without real investments in I/O.  We are lean 
enough that 10% addition to the farm would be noticed during busy periods.

We do have some headroom -- which is why we can move GlueX DAQ from 
400-800 on very short notice.  But we have priorities in place: (1) DAQ, 
(2) high utilization of LQCD/HPC, (3) high utilization of the farm, (4) 
critical WAN transfers (e.g. LQCD data products coming in from 
supercomputing centers for onsite analysis), (5) interactive analysis, 
(6) everything else.  Your file transfers are in (6) as they are outside 
of the reviewed computing plans.

We are always open to adjusting everything to maximize science (e.g. 
rapidly doubling DAQ), and Physics division is the division who would be 
engaged (as funding source) in these optimizations.

regards,
Chip

On 5/17/16 7:56 AM, Shepherd, Matthew wrote:
> Chip,
>
> This was my initial question:  what is a reasonable rate to expect?
>
> I'm not displeased with 50 MB/s, but I'd like to know if/how there are mechanisms for one or two factors of 2 gain.  About a year ago I managed 130 MB/s averaged over an hour.
>
> I'm basically trying to explore options to move analysis offsite to alleviate load on the JLab farm.  I think we may have some significant resources here at the university level to tap into.  I can imagine things like hosting a data mirror here if your pipeline offsite is not so big.  Local IT folks seem to enjoy pushing limits and are always looking for customers.  Our flagship research machine here, which is getting old by local standards, is a ~1000 node cray with 22,000 cores and a 5 PB high throughput work disk.  Accounts are free to any researcher at the university - it is provided as a service.  I'll certainly only be able to get a fraction of this, but a fraction may be significant on the scales of the types of problems we're working with.
>
> Matt
>
> ---------------------------------------------------------------------
> Matthew Shepherd, Associate Professor
> Department of Physics, Indiana University, Swain West 265
> 727 East Third Street, Bloomington, IN 47405
>
> Office Phone:  +1 812 856 5808
>
>> On May 16, 2016, at 10:34 PM, Chip Watson <watson at jlab.org> wrote:
>>
>> All,
>>
>> Keep in mind that Scientific Computing is not provisioned to deliver that kind of bandwidth (500 MB/s) to you.  If you actually succeeded, we'd probably see a negative impact on operations and have to kill the connection.
>>
>> Chip
>>
>> On 5/16/16 8:45 PM, Shepherd, Matthew wrote:
>>>> On May 16, 2016, at 5:32 PM, Richard Jones <richard.t.jones at uconn.edu> wrote:
>>>>   What did you use for the number of parallel streams / job, and number of simultaneous jobs to get the performance you saw?
>>> I simply used the globus web client and created a personal endpoint using globus personal connect running on Linux.  I didn't modify anything from default.  The single request was the only request I submitted during the time frame.
>>>
>>> I'm not disappointed with 50 MB/s averaged over 2.5 days, but in principle, I have ten times that bandwidth going into the machine.
>>>
>>> Matt
>>>
>>> ---------------------------------------------------------------------
>>> Matthew Shepherd, Associate Professor
>>> Department of Physics, Indiana University, Swain West 265
>>> 727 East Third Street, Bloomington, IN 47405
>>>
>>> Office Phone:  +1 812 856 5808
>>>
>>>
>>> _______________________________________________
>>> Halld-offline mailing list
>>> Halld-offline at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/halld-offline