[Halld-offline] expected bandwidth offsite
Chip Watson
watson at jlab.org
Tue May 17 11:28:23 EDT 2016
Matt,
Our provisioning is lean and balanced, meaning that we have provisioned
tape and disk bandwidth based upon the size of the farm, now 3K+ cores,
moving this summer to 6K-7k cores (2014 generation cores, older cores
are as much as 2x slower). Another upgrade comes in the fall to reach
~8K-10K cores (same units). Upgrades are based upon the halls' long
range plans, and do come with upgrades in disk and tape bandwidth to
keep the system balanced.
"Free" cycles (i.e. not funded by DOE) would require some infrastructure
provisioning to make sure that these free cycles don't slow down our
other systems (i.e. make sure that there is a real gain). Small
contributions can be absorbed as being in the noise, but larger
contributions will require that we do some type of QOS so that we don't
starve local resources. (Example, these recent file transfers of many
terabytes did need to tie up a noticeable portion of GlueX's /cache or
/volatile quotas.)
All this to say, you can't really double the performance of the farm for
I/O intensive activities without real investments in I/O. We are lean
enough that 10% addition to the farm would be noticed during busy periods.
We do have some headroom -- which is why we can move GlueX DAQ from
400-800 on very short notice. But we have priorities in place: (1) DAQ,
(2) high utilization of LQCD/HPC, (3) high utilization of the farm, (4)
critical WAN transfers (e.g. LQCD data products coming in from
supercomputing centers for onsite analysis), (5) interactive analysis,
(6) everything else. Your file transfers are in (6) as they are outside
of the reviewed computing plans.
We are always open to adjusting everything to maximize science (e.g.
rapidly doubling DAQ), and Physics division is the division who would be
engaged (as funding source) in these optimizations.
regards,
Chip
On 5/17/16 7:56 AM, Shepherd, Matthew wrote:
> Chip,
>
> This was my initial question: what is a reasonable rate to expect?
>
> I'm not displeased with 50 MB/s, but I'd like to know if/how there are mechanisms for one or two factors of 2 gain. About a year ago I managed 130 MB/s averaged over an hour.
>
> I'm basically trying to explore options to move analysis offsite to alleviate load on the JLab farm. I think we may have some significant resources here at the university level to tap into. I can imagine things like hosting a data mirror here if your pipeline offsite is not so big. Local IT folks seem to enjoy pushing limits and are always looking for customers. Our flagship research machine here, which is getting old by local standards, is a ~1000 node cray with 22,000 cores and a 5 PB high throughput work disk. Accounts are free to any researcher at the university - it is provided as a service. I'll certainly only be able to get a fraction of this, but a fraction may be significant on the scales of the types of problems we're working with.
>
> Matt
>
> ---------------------------------------------------------------------
> Matthew Shepherd, Associate Professor
> Department of Physics, Indiana University, Swain West 265
> 727 East Third Street, Bloomington, IN 47405
>
> Office Phone: +1 812 856 5808
>
>> On May 16, 2016, at 10:34 PM, Chip Watson <watson at jlab.org> wrote:
>>
>> All,
>>
>> Keep in mind that Scientific Computing is not provisioned to deliver that kind of bandwidth (500 MB/s) to you. If you actually succeeded, we'd probably see a negative impact on operations and have to kill the connection.
>>
>> Chip
>>
>> On 5/16/16 8:45 PM, Shepherd, Matthew wrote:
>>>> On May 16, 2016, at 5:32 PM, Richard Jones <richard.t.jones at uconn.edu> wrote:
>>>> What did you use for the number of parallel streams / job, and number of simultaneous jobs to get the performance you saw?
>>> I simply used the globus web client and created a personal endpoint using globus personal connect running on Linux. I didn't modify anything from default. The single request was the only request I submitted during the time frame.
>>>
>>> I'm not disappointed with 50 MB/s averaged over 2.5 days, but in principle, I have ten times that bandwidth going into the machine.
>>>
>>> Matt
>>>
>>> ---------------------------------------------------------------------
>>> Matthew Shepherd, Associate Professor
>>> Department of Physics, Indiana University, Swain West 265
>>> 727 East Third Street, Bloomington, IN 47405
>>>
>>> Office Phone: +1 812 856 5808
>>>
>>>
>>> _______________________________________________
>>> Halld-offline mailing list
>>> Halld-offline at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/halld-offline
More information about the Halld-offline
mailing list