<div dir="ltr">Hello Chip,<div><br></div><div>I agree what I was doing does not scale at all, but might you be misinterpreting the speeds we are getting? I was saying that I was seeing 100-200 Mbit per second, not megabytes per second. I think you were interpreting it as megabytes/s, which explains your using 1Gb/s in the conversation. We are an order of magnitude below that. Not sure what the reason is, but that is what I was seeing. Look at Mike Staig's screenshot a few messages back, where he is showing globusonline numbers. There it clearly says 289Mbit/s. It is difficult to see how speeds like this can clog anything, at least until a dozen of us try to do it at once.</div><div><br></div><div>-Richard Jones</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 8:43 PM, Chip Watson <span dir="ltr"><<a href="mailto:watson@jlab.org" target="_blank">watson@jlab.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<font size="+1">Richard,<br>
<br>
This would </font><font size="+1"><font size="+1">of course </font>have
the same bottleneck at the WAN level, and once the counting house
RAID goes active (acquiring data and copying it to the computer
center for archiving in the tape library and caching on disk),
I'll bet your transfer rate from the counting house will also
drop. But since you are only seeing 1 Gb, you might not notice,
but watch to make sure you don't slow down the transfers to tape,
as I'm pretty sure there's no throttling of anything. If you
tried running multiple transfers, you might cut into their DAQ
rates. We get >400 MB/s from the counting house to CEBAF
Center (could easily get more since it is a 10 Gb/s link, but that
is above the requested rate so we stopped there), so perhaps you
are only a 20% perturbation (if only one person does it).<br>
<br>
You don't actually have a separate path, since your data is
traversing the same 10 Gb link from the counting house to CEBAF
Center, and from a switch/router where it lands it can leave the
site through one of several nodes/services (including login nodes)
or land on our 1 PB disk. We'll have 1 PB disk provisioned with
multiple GB/s bandwidth for GlueX, including > 1 GByte/s
to/from tape for batch jobs this year (and growing as needed).<br>
<br>
When CLAS-12 goes online I hope we'll have a second 10gbs link to
the outside world, otherwise your bandwidth will fall when they
start.<span class="HOEnZb"><font color="#888888"><br>
<br>
Chip<br>
<br>
</font></span></font><div><div class="h5">
<div>On 1/27/15 8:26 PM, Richard Jones
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hello Matt and all,
<div><br>
</div>
<div>Matt, I have been using rsync, mainly because it works
without user interaction. I was getting 100-150MB/s coming
straight off the raid disk in the counting house and going
over ssh pipes to UConn. I liked seeing the files at UConn as
soon as they appeared on the gluonraid disks, but as soon as
the running became anything close to stable it would quickly
fall far behind. As a long term solution I think it is
hopeless.</div>
<div><br>
</div>
<div>It might make sense for one of our offsite institutions (IU
might be a good place, with good I2 connections) to set up a
data transfer node for all offsite institutions. You might
even convince the guys in the IU computer center to give you
some space on an existing data transfer device connected to
I2. We can easily sustain an order of magnitude higher speed
than what Mike Staib is seeing from jlab/globusonline, and it
would avoid the pitfall of multiplying the load on Chip's
uplink, which seems like it might be a bottleneck if a bunch
of us start doing the same thing. For it to be useful, we
would probably need to agree on a useful subset of the data
(no more than a few hundred TB to keep the cost down) that all
of us would like to have available offsite.</div>
<div><br>
</div>
<div>Another choice might be to slice up the data into run
segments, and assign each of the 5 or so offsite institutions
with local storage resources the task of maintaining a
high-throughput access node for their slice to the rest of the
offsite institutions. That way no one site would need to
devote more than maybe 50-80TB to the shared filesystem, and
everyone would have ready access to it as soon as it could be
squeezed out from jlab/globusonline.</div>
<div><br>
</div>
<div>-Richard Jones</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Jan 27, 2015 at 2:23 PM,
Shepherd, Matthew <span dir="ltr"><<a href="mailto:mashephe@indiana.edu" target="_blank">mashephe@indiana.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Chip and Mike,<br>
<br>
Thanks for your replies. I'll try the globus online<br>
method to do transfers. I was getting only about<br>
100-200 Mbs using scp and trying to find the bottleneck.<br>
<span><br>
Matt<br>
<br>
---------------------------------------------------------------------<br>
Matthew Shepherd, Associate Professor<br>
Department of Physics, Indiana University, Swain West 265<br>
727 East Third Street, Bloomington, IN 47405<br>
<br>
Office Phone: <a href="tel:%2B1%20812%20856%205808" value="+18128565808" target="_blank">+1
812 856 5808</a><br>
<br>
</span>
<div>
<div>> On Jan 27, 2015, at 2:10 PM, Chip
Watson <<a href="mailto:watson@jlab.org" target="_blank">watson@jlab.org</a>>
wrote:<br>
><br>
> Matt,<br>
><br>
> The gateway node is intentionaly not tuned for best
TCP streaming so that we don't consume more than 50% of
the lab's bandwidth. Further, JLab is on a MAN shared
with ODU and W&M, with a single uplink to D.C.
People generally get 2-5 Gbps using globus online to do
the transfers (gridftp under the hood).<br>
><br>
> You also need to keep in mind that the JLab file
systems are all pretty busy serving the batch system.
It is not at all unusual for a single file stream to
only achieve 1 Gbps. If you want to use scp, you will
need to use multiple instances of it to match what
globus and gridftp can achieve.<br>
><br>
> Chip<br>
><br>
> On 1/27/15 2:01 PM, Shepherd, Matthew wrote:<br>
>> Hi all,<br>
>><br>
>> Does anyone have suggestions for doing high
speed<br>
>> file copies offsite?<br>
>><br>
>> I consulted this page:<br>
>><br>
>><br>
>> <a href="https://scicomp.jlab.org/docs/node/11" target="_blank">https://scicomp.jlab.org/docs/node/11</a><br>
>><br>
>><br>
>> Which suggests to use the node: <a href="http://globus.jlab.org" target="_blank">globus.jlab.org</a><br>
>><br>
>> This node has real limited functionality. I
can't get rsync to<br>
>> work and there is no tar to use with ssh.<br>
>><br>
>> I tried just:<br>
>><br>
>> scp -r<br>
>><br>
>> This works, but I'd like to know if I can go
faster.<br>
>> I'm getting speeds that are orders of magnitude
below<br>
>> the 10 Gbps pipe that is at JLab and also at my<br>
>> end. I'm sure there is other competing
traffic, but<br>
>> I think part of the problem might be the scp is
poorly<br>
>> optimized for coping many small files.<br>
>><br>
>> Any experience with this in the offline group?<br>
>><br>
>> Matt<br>
>><br>
>>
---------------------------------------------------------------------<br>
>> Matthew Shepherd, Associate Professor<br>
>> Department of Physics, Indiana University,
Swain West 265<br>
>> 727 East Third Street, Bloomington, IN 47405<br>
>><br>
>> Office Phone: <a href="tel:%2B1%20812%20856%205808" value="+18128565808" target="_blank">+1 812 856 5808</a><br>
>><br>
>><br>
>> _______________________________________________<br>
>> Halld-offline mailing list<br>
>><br>
>> <a href="mailto:Halld-offline@jlab.org" target="_blank">Halld-offline@jlab.org</a><br>
>> <a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
><br>
> _______________________________________________<br>
> Halld-offline mailing list<br>
> <a href="mailto:Halld-offline@jlab.org" target="_blank">Halld-offline@jlab.org</a><br>
> <a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
<br>
<br>
_______________________________________________<br>
Halld-offline mailing list<br>
<a href="mailto:Halld-offline@jlab.org" target="_blank">Halld-offline@jlab.org</a><br>
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>