[Halld-offline] advice on high speed file copies offsite

Richard Jones richard.t.jones at uconn.edu
Tue Jan 27 20:26:41 EST 2015


Hello Matt and all,

Matt, I have been using rsync, mainly because it works without user
interaction. I was getting 100-150MB/s coming straight off the raid disk in
the counting house and going over ssh pipes to UConn. I liked seeing the
files at UConn as soon as they appeared on the gluonraid disks, but as soon
as the running became anything close to stable it would quickly fall far
behind. As a long term solution I think it is hopeless.

It might make sense for one of our offsite institutions (IU might be a good
place, with good I2 connections) to set up a data transfer node for all
offsite institutions. You might even convince the guys in the IU computer
center to give you some space on an existing data transfer device connected
to I2. We can easily sustain an order of magnitude higher speed than what
Mike Staib is seeing from jlab/globusonline, and it would avoid the pitfall
of multiplying the load on Chip's uplink, which seems like it might be a
bottleneck if a bunch of us start doing the same thing. For it to be
useful, we would probably need to agree on a useful subset of the data (no
more than a few hundred TB to keep the cost down) that all of us would like
to have available offsite.

Another choice might be to slice up the data into run segments, and assign
each of the 5 or so offsite institutions with local storage resources the
task of maintaining a high-throughput access node for their slice to the
rest of the offsite institutions. That way no one site would need to devote
more than maybe 50-80TB to the shared filesystem, and everyone would have
ready access to it as soon as it could be squeezed out from
jlab/globusonline.

-Richard Jones

On Tue, Jan 27, 2015 at 2:23 PM, Shepherd, Matthew <mashephe at indiana.edu>
wrote:

>
> Chip and Mike,
>
> Thanks for your replies.  I'll try the globus online
> method to do transfers.  I was getting only about
> 100-200 Mbs using scp and trying to find the bottleneck.
>
> Matt
>
> ---------------------------------------------------------------------
> Matthew Shepherd, Associate Professor
> Department of Physics, Indiana University, Swain West 265
> 727 East Third Street, Bloomington, IN 47405
>
> Office Phone:  +1 812 856 5808
>
> > On Jan 27, 2015, at 2:10 PM, Chip Watson <watson at jlab.org> wrote:
> >
> > Matt,
> >
> > The gateway node is intentionaly not tuned for best TCP streaming so
> that we don't consume more than 50% of the lab's bandwidth.  Further, JLab
> is on a MAN shared with ODU and W&M, with a single uplink to D.C.  People
> generally get 2-5 Gbps using globus online to do the transfers (gridftp
> under the hood).
> >
> > You also need to keep in mind that the JLab file systems are all pretty
> busy serving the batch system.  It is not at all unusual for a single file
> stream to only achieve 1 Gbps.  If you want to use scp, you will need to
> use multiple instances of it to match what globus and gridftp can achieve.
> >
> > Chip
> >
> > On 1/27/15 2:01 PM, Shepherd, Matthew wrote:
> >> Hi all,
> >>
> >> Does anyone have suggestions for doing high speed
> >> file copies offsite?
> >>
> >> I consulted this page:
> >>
> >>
> >> https://scicomp.jlab.org/docs/node/11
> >>
> >>
> >> Which suggests to use the node:  globus.jlab.org
> >>
> >> This node has real limited functionality.  I can't get rsync to
> >> work and there is no tar to use with ssh.
> >>
> >> I tried just:
> >>
> >> scp -r
> >>
> >> This works, but I'd like to know if I can go faster.
> >> I'm getting speeds that are orders of magnitude below
> >> the 10 Gbps pipe that is at JLab and also at my
> >> end.  I'm sure there is other competing traffic, but
> >> I think part of the problem might be the scp is poorly
> >> optimized for coping many small files.
> >>
> >> Any experience with this in the offline group?
> >>
> >> Matt
> >>
> >> ---------------------------------------------------------------------
> >> Matthew Shepherd, Associate Professor
> >> Department of Physics, Indiana University, Swain West 265
> >> 727 East Third Street, Bloomington, IN 47405
> >>
> >> Office Phone:  +1 812 856 5808
> >>
> >>
> >> _______________________________________________
> >> Halld-offline mailing list
> >>
> >> Halld-offline at jlab.org
> >> https://mailman.jlab.org/mailman/listinfo/halld-offline
> >
> > _______________________________________________
> > Halld-offline mailing list
> > Halld-offline at jlab.org
> > https://mailman.jlab.org/mailman/listinfo/halld-offline
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-offline/attachments/20150127/a15e3153/attachment.html 


More information about the Halld-offline mailing list