[Halld-offline] advice on high speed file copies offsite
Shepherd, Matthew
mashephe at indiana.edu
Thu Jan 29 07:55:03 EST 2015
Hi all,
I had a chance to do a little testing this morning using
the globus web interface to transfer files.
I started a copy job of the REST skim from volatile. This
is 120 GB total but spread of 1500 directories and 6700 files.
I was able to get an average rate of 405 Mbit/s for 40 minutes
between 6:08EST and 6:48EST.
Some time through the job I decided to start another copy,
but this time from cache. My thinking was that volatile is probably
accessed by many jobs simultaneously (writing output) and cache
access is likely only at the start of a job. (Maybe my picture is wrong!)
Anyway, I started copying a hunk of raw data from cache.
This started at 6:27EST (while the other job was still going) and
completed at 7:21EST . During the initial stages, watching the the
instantaneous speed on the globus web page indicated spurts
of 1.5 Gbit/s at the same time the volatile copy job was getting
400 Mbit/s. Overall the job copied 395 GB which was 22 files
over 52 directories at an average rate of 1.04 Gbit/s.
I don't know whether the speed difference was due to the
cache/volatile difference or the difference between many
small files and a few large files or something else.
However, this answers the question that I was after: we'd like
to make some modest improvements to our local computing at
IU and one of the options is to plumb our system to share a
dedicated 10 Gb line that the local ATLAS group uses, which
I used for these transfers, as opposed to the current scheme
of sharing a 1 Gb line with the entire physics building. It seems
like this is a worthwhile move for us as JLab can serve up
the data at speeds that would saturate our current networking
scheme.
I'm also happy to pursue idea Richard proposed of hosting
and offsite mirror for some data. We should have many
tens of terabytes of capacity on our local machine. It may
be sufficient to start and if it is useful, we can pursue
other solutions. First thing is to understand what people
would like to have and if it is really more useful than
obtaining things directly from JLab.
Matt
---------------------------------------------------------------------
Matthew Shepherd, Associate Professor
Department of Physics, Indiana University, Swain West 265
727 East Third Street, Bloomington, IN 47405
Office Phone: +1 812 856 5808
> On Jan 27, 2015, at 2:10 PM, Chip Watson <watson at jlab.org> wrote:
>
> Matt,
>
> The gateway node is intentionaly not tuned for best TCP streaming so that we don't consume more than 50% of the lab's bandwidth. Further, JLab is on a MAN shared with ODU and W&M, with a single uplink to D.C. People generally get 2-5 Gbps using globus online to do the transfers (gridftp under the hood).
>
> You also need to keep in mind that the JLab file systems are all pretty busy serving the batch system. It is not at all unusual for a single file stream to only achieve 1 Gbps. If you want to use scp, you will need to use multiple instances of it to match what globus and gridftp can achieve.
>
> Chip
>
> On 1/27/15 2:01 PM, Shepherd, Matthew wrote:
>> Hi all,
>>
>> Does anyone have suggestions for doing high speed
>> file copies offsite?
>>
>> I consulted this page:
>>
>>
>> https://scicomp.jlab.org/docs/node/11
>>
>>
>> Which suggests to use the node: globus.jlab.org
>>
>> This node has real limited functionality. I can't get rsync to
>> work and there is no tar to use with ssh.
>>
>> I tried just:
>>
>> scp -r
>>
>> This works, but I'd like to know if I can go faster.
>> I'm getting speeds that are orders of magnitude below
>> the 10 Gbps pipe that is at JLab and also at my
>> end. I'm sure there is other competing traffic, but
>> I think part of the problem might be the scp is poorly
>> optimized for coping many small files.
>>
>> Any experience with this in the offline group?
>>
>> Matt
>>
>> ---------------------------------------------------------------------
>> Matthew Shepherd, Associate Professor
>> Department of Physics, Indiana University, Swain West 265
>> 727 East Third Street, Bloomington, IN 47405
>>
>> Office Phone: +1 812 856 5808
>>
>>
>> _______________________________________________
>> Halld-offline mailing list
>>
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
More information about the Halld-offline
mailing list