<div dir="ltr">Hello Matt and all,<div><br></div><div>Matt, I have been using rsync, mainly because it works without user interaction. I was getting 100-150MB/s coming straight off the raid disk in the counting house and going over ssh pipes to UConn. I liked seeing the files at UConn as soon as they appeared on the gluonraid disks, but as soon as the running became anything close to stable it would quickly fall far behind. As a long term solution I think it is hopeless.</div><div><br></div><div>It might make sense for one of our offsite institutions (IU might be a good place, with good I2 connections) to set up a data transfer node for all offsite institutions. You might even convince the guys in the IU computer center to give you some space on an existing data transfer device connected to I2. We can easily sustain an order of magnitude higher speed than what Mike Staib is seeing from jlab/globusonline, and it would avoid the pitfall of multiplying the load on Chip's uplink, which seems like it might be a bottleneck if a bunch of us start doing the same thing. For it to be useful, we would probably need to agree on a useful subset of the data (no more than a few hundred TB to keep the cost down) that all of us would like to have available offsite.</div><div><br></div><div>Another choice might be to slice up the data into run segments, and assign each of the 5 or so offsite institutions with local storage resources the task of maintaining a high-throughput access node for their slice to the rest of the offsite institutions. That way no one site would need to devote more than maybe 50-80TB to the shared filesystem, and everyone would have ready access to it as soon as it could be squeezed out from jlab/globusonline.</div><div><br></div><div>-Richard Jones</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 2:23 PM, Shepherd, Matthew <span dir="ltr"><<a href="mailto:mashephe@indiana.edu" target="_blank">mashephe@indiana.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Chip and Mike,<br>
<br>
Thanks for your replies. I'll try the globus online<br>
method to do transfers. I was getting only about<br>
100-200 Mbs using scp and trying to find the bottleneck.<br>
<span class="im HOEnZb"><br>
Matt<br>
<br>
---------------------------------------------------------------------<br>
Matthew Shepherd, Associate Professor<br>
Department of Physics, Indiana University, Swain West 265<br>
727 East Third Street, Bloomington, IN 47405<br>
<br>
Office Phone: <a href="tel:%2B1%20812%20856%205808" value="+18128565808">+1 812 856 5808</a><br>
<br>
</span><div class="HOEnZb"><div class="h5">> On Jan 27, 2015, at 2:10 PM, Chip Watson <<a href="mailto:watson@jlab.org">watson@jlab.org</a>> wrote:<br>
><br>
> Matt,<br>
><br>
> The gateway node is intentionaly not tuned for best TCP streaming so that we don't consume more than 50% of the lab's bandwidth. Further, JLab is on a MAN shared with ODU and W&M, with a single uplink to D.C. People generally get 2-5 Gbps using globus online to do the transfers (gridftp under the hood).<br>
><br>
> You also need to keep in mind that the JLab file systems are all pretty busy serving the batch system. It is not at all unusual for a single file stream to only achieve 1 Gbps. If you want to use scp, you will need to use multiple instances of it to match what globus and gridftp can achieve.<br>
><br>
> Chip<br>
><br>
> On 1/27/15 2:01 PM, Shepherd, Matthew wrote:<br>
>> Hi all,<br>
>><br>
>> Does anyone have suggestions for doing high speed<br>
>> file copies offsite?<br>
>><br>
>> I consulted this page:<br>
>><br>
>><br>
>> <a href="https://scicomp.jlab.org/docs/node/11" target="_blank">https://scicomp.jlab.org/docs/node/11</a><br>
>><br>
>><br>
>> Which suggests to use the node: <a href="http://globus.jlab.org" target="_blank">globus.jlab.org</a><br>
>><br>
>> This node has real limited functionality. I can't get rsync to<br>
>> work and there is no tar to use with ssh.<br>
>><br>
>> I tried just:<br>
>><br>
>> scp -r<br>
>><br>
>> This works, but I'd like to know if I can go faster.<br>
>> I'm getting speeds that are orders of magnitude below<br>
>> the 10 Gbps pipe that is at JLab and also at my<br>
>> end. I'm sure there is other competing traffic, but<br>
>> I think part of the problem might be the scp is poorly<br>
>> optimized for coping many small files.<br>
>><br>
>> Any experience with this in the offline group?<br>
>><br>
>> Matt<br>
>><br>
>> ---------------------------------------------------------------------<br>
>> Matthew Shepherd, Associate Professor<br>
>> Department of Physics, Indiana University, Swain West 265<br>
>> 727 East Third Street, Bloomington, IN 47405<br>
>><br>
>> Office Phone: <a href="tel:%2B1%20812%20856%205808" value="+18128565808">+1 812 856 5808</a><br>
>><br>
>><br>
>> _______________________________________________<br>
>> Halld-offline mailing list<br>
>><br>
>> <a href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><br>
>> <a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
><br>
> _______________________________________________<br>
> Halld-offline mailing list<br>
> <a href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><br>
> <a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
<br>
<br>
_______________________________________________<br>
Halld-offline mailing list<br>
<a href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><br>
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
</div></div></blockquote></div><br></div>