[Halld-offline] speeds for download of secondary GlueX datasets from JLab

Mon May 23 18:39:26 EDT 2016

All,

If you really want this as a production service, you are going to have 
to say what you need, including network bandwidth, staging area, and 
tape bandwidth.  I'm happy to look into what it takes to provision this 
as a resource.  And we might also need to look into QoS as well.  Hall B 
might be interested in having whatever GlueX is granted, so we'll need 
to take that into account as well.

regards, Chip

On 5/23/16 5:25 PM, Richard Jones wrote:
> Dear Matt,
>
> Following up on your query regarding download speeds for fetching 
> secondary datasets from Jlab to offsite storage resources, I have the 
> following experience to share.
>
>  1. /first bottleneck:/ switch buffer overflows (last 6 feet) -- data
>     path was 10Gb from source to my server until last 6 feet, where it
>     dropped to 1Gb. Performance (tcp) was highly asymmetric: 95 MB/s
>     upload speed, but poor and oscillating download speed averaging
>     *15 MB/s*. This asymmetry was due to switch buffer overflows at
>     the switch port where it necks down from 10Gb to 1Gb -- tcp does
>     not have any back-pressure mechanism except packet loss, which
>     tends to be catastrophic over high-latency pathways with std linux
>     kernel congestion algos cubic, htcp.
>  2. /second bottleneck:/ disk speed on receiving server -- as soon as
>     I replaced the last 6 feet with a 10Gb NIC / transceiver, I moved
>     up to the next resistance point, around *140 MB/s* on my server.
>     Using diagnostics I could see that my disk drives (2 commodity
>     SATA 1TB drives in parallel) were both saturating their write
>     queues. At this speed I was filling up my disks fast, so I had to
>     start simultaneous jobs to flush these files from temporary
>     filesystem on the receiving server to permanent storage in my
>     dcache. Once the drives were interleaving reads and writes, the
>     download performance dropped to around 70MB/s net for both drives.
>  3. /third bottleneck:/ fear of too much success -- to see what the
>     next limiting point might be, I switched to a data transfer node
>     that the UConn data center made available for testing. It combines
>     a 10Gb nic connected to a central campus switch and what Dell
>     calls a high-performance raid (Dell H700, 500GB, probably large
>     fraction of this is SSD). On this system I never saw the disks
>     saturate their read/write queues. However the throughput rose
>     quickly as the transfers started, and as soon as I saw transfers
>     exceeding *300MB/s* I remembered Chip's warning and cancelled the
>     job. I then decreased the number of parallel streams (from the
>     globus online defaults) to limit the impact on JLab
>     infrastructure. Using just 1 simultaneous transfer / 2 parallel
>     streams (globus default is 2 / 4) I was seeing a steady-state rate
>     between 150 and *200 MB/s *average download speed, even with
>     simultaneous downloading and pushing from the fast raid to my
>     dcache (multiple parallel jobs) -- which was necessary to keep
>     from overflowing this 500GB partition in a matter of minutes.
>     Decreasing the globus options to just 1 / 1 I was able to limit
>     the speed to *120 MB/s* which is still enough to make me happy for
>     now.
>
> I know without any fiddling you were able to get somewhere between 
> bottlenecks 1 and 2 above. From this log of lessons learned, I suspect 
> you will know what steps you might take to increase your speed to the 
> next resistance point. One suggestion for the future: we should 
> coordinate this. For example, anyone who wants offsite access to the 
> PS triggers should get them from UConn, not fetch them again from 
> Jlab, since we already have the full set of them from Spring 2016 in 
> gridftp-accessible storage at UConn. Likewise for what you pull to IU? 
> Perhaps we should set up a central place where we record what GlueX 
> data is available, where, and by what protocol.
>
> -Richard Jones

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20160523/0eb0f437/attachment-0002.html>