[Halld-offline] advice on high speed file copies offsite

Tue Jan 27 21:36:25 EST 2015

Hello Chip,

I agree what I was doing does not scale at all, but might you be
misinterpreting the speeds we are getting? I was saying that I was seeing
100-200 Mbit per second, not megabytes per second. I think you were
interpreting it as megabytes/s, which explains your using 1Gb/s in the
conversation. We are an order of magnitude below that. Not sure what the
reason is, but that is what I was seeing. Look at Mike Staig's screenshot a
few messages back, where he is showing globusonline numbers. There it
clearly says 289Mbit/s. It is difficult to see how speeds like this can
clog anything, at least until a dozen of us try to do it at once.

-Richard Jones

On Tue, Jan 27, 2015 at 8:43 PM, Chip Watson <watson at jlab.org> wrote:

>  Richard,
>
> This would of course have the same bottleneck at the WAN level, and once
> the counting house RAID goes active (acquiring data and copying it to the
> computer center for archiving in the tape library and caching on disk),
> I'll bet your transfer rate from the counting house will also drop.  But
> since you are only seeing 1 Gb, you might not notice, but watch to make
> sure you don't slow down the transfers to tape, as I'm pretty sure there's
> no throttling of anything.  If you tried running multiple transfers, you
> might cut into their DAQ rates.  We get >400 MB/s from the counting house
> to CEBAF Center (could easily get more since it is a 10 Gb/s link, but that
> is above the requested rate so we stopped there), so perhaps you are only a
> 20% perturbation (if only one person does it).
>
> You don't actually have a separate path, since your data is traversing the
> same 10 Gb link from the counting house to CEBAF Center, and from a
> switch/router where it lands it can leave the site through one of several
> nodes/services (including login nodes) or land on our 1 PB disk.  We'll
> have 1 PB disk provisioned with multiple GB/s bandwidth for GlueX,
> including > 1 GByte/s to/from tape for batch jobs this year (and growing as
> needed).
>
> When CLAS-12 goes online I hope we'll have a second 10gbs link to the
> outside world, otherwise your bandwidth will fall when they start.
>
> Chip
>
>  On 1/27/15 8:26 PM, Richard Jones wrote:
>
> Hello Matt and all,
>
>  Matt, I have been using rsync, mainly because it works without user
> interaction. I was getting 100-150MB/s coming straight off the raid disk in
> the counting house and going over ssh pipes to UConn. I liked seeing the
> files at UConn as soon as they appeared on the gluonraid disks, but as soon
> as the running became anything close to stable it would quickly fall far
> behind. As a long term solution I think it is hopeless.
>
>  It might make sense for one of our offsite institutions (IU might be a
> good place, with good I2 connections) to set up a data transfer node for
> all offsite institutions. You might even convince the guys in the IU
> computer center to give you some space on an existing data transfer device
> connected to I2. We can easily sustain an order of magnitude higher speed
> than what Mike Staib is seeing from jlab/globusonline, and it would avoid
> the pitfall of multiplying the load on Chip's uplink, which seems like it
> might be a bottleneck if a bunch of us start doing the same thing. For it
> to be useful, we would probably need to agree on a useful subset of the
> data (no more than a few hundred TB to keep the cost down) that all of us
> would like to have available offsite.
>
>  Another choice might be to slice up the data into run segments, and
> assign each of the 5 or so offsite institutions with local storage
> resources the task of maintaining a high-throughput access node for their
> slice to the rest of the offsite institutions. That way no one site would
> need to devote more than maybe 50-80TB to the shared filesystem, and
> everyone would have ready access to it as soon as it could be squeezed out
> from jlab/globusonline.
>
>  -Richard Jones
>
> On Tue, Jan 27, 2015 at 2:23 PM, Shepherd, Matthew <mashephe at indiana.edu>
> wrote:
>
>>
>> Chip and Mike,
>>
>> Thanks for your replies.  I'll try the globus online
>> method to do transfers.  I was getting only about
>> 100-200 Mbs using scp and trying to find the bottleneck.
>>
>> Matt
>>
>> ---------------------------------------------------------------------
>> Matthew Shepherd, Associate Professor
>> Department of Physics, Indiana University, Swain West 265
>> 727 East Third Street, Bloomington, IN 47405
>>
>> Office Phone:  +1 812 856 5808 <%2B1%20812%20856%205808>
>>
>>  > On Jan 27, 2015, at 2:10 PM, Chip Watson <watson at jlab.org> wrote:
>> >
>> > Matt,
>> >
>> > The gateway node is intentionaly not tuned for best TCP streaming so
>> that we don't consume more than 50% of the lab's bandwidth.  Further, JLab
>> is on a MAN shared with ODU and W&M, with a single uplink to D.C.  People
>> generally get 2-5 Gbps using globus online to do the transfers (gridftp
>> under the hood).
>> >
>> > You also need to keep in mind that the JLab file systems are all pretty
>> busy serving the batch system.  It is not at all unusual for a single file
>> stream to only achieve 1 Gbps.  If you want to use scp, you will need to
>> use multiple instances of it to match what globus and gridftp can achieve.
>> >
>> > Chip
>> >
>> > On 1/27/15 2:01 PM, Shepherd, Matthew wrote:
>> >> Hi all,
>> >>
>> >> Does anyone have suggestions for doing high speed
>> >> file copies offsite?
>> >>
>> >> I consulted this page:
>> >>
>> >>
>> >> https://scicomp.jlab.org/docs/node/11
>> >>
>> >>
>> >> Which suggests to use the node:  globus.jlab.org
>> >>
>> >> This node has real limited functionality.  I can't get rsync to
>> >> work and there is no tar to use with ssh.
>> >>
>> >> I tried just:
>> >>
>> >> scp -r
>> >>
>> >> This works, but I'd like to know if I can go faster.
>> >> I'm getting speeds that are orders of magnitude below
>> >> the 10 Gbps pipe that is at JLab and also at my
>> >> end.  I'm sure there is other competing traffic, but
>> >> I think part of the problem might be the scp is poorly
>> >> optimized for coping many small files.
>> >>
>> >> Any experience with this in the offline group?
>> >>
>> >> Matt
>> >>
>> >> ---------------------------------------------------------------------
>> >> Matthew Shepherd, Associate Professor
>> >> Department of Physics, Indiana University, Swain West 265
>> >> 727 East Third Street, Bloomington, IN 47405
>> >>
>> >> Office Phone:  +1 812 856 5808
>> >>
>> >>
>> >> _______________________________________________
>> >> Halld-offline mailing list
>> >>
>> >> Halld-offline at jlab.org
>> >> https://mailman.jlab.org/mailman/listinfo/halld-offline
>> >
>> > _______________________________________________
>> > Halld-offline mailing list
>> > Halld-offline at jlab.org
>> > https://mailman.jlab.org/mailman/listinfo/halld-offline
>>
>>
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20150127/1b4b5995/attachment-0002.html>