[Halld-offline] speeds for download of secondary GlueX datasets from JLab
Richard Jones
richard.t.jones at uconn.edu
Mon May 23 17:25:45 EDT 2016
Dear Matt,
Following up on your query regarding download speeds for fetching secondary
datasets from Jlab to offsite storage resources, I have the following
experience to share.
1. *first bottleneck:* switch buffer overflows (last 6 feet) -- data
path was 10Gb from source to my server until last 6 feet, where it dropped
to 1Gb. Performance (tcp) was highly asymmetric: 95 MB/s upload speed, but
poor and oscillating download speed averaging *15 MB/s*. This asymmetry
was due to switch buffer overflows at the switch port where it necks down
from 10Gb to 1Gb -- tcp does not have any back-pressure mechanism except
packet loss, which tends to be catastrophic over high-latency pathways with
std linux kernel congestion algos cubic, htcp.
2. *second bottleneck:* disk speed on receiving server -- as soon as I
replaced the last 6 feet with a 10Gb NIC / transceiver, I moved up to the
next resistance point, around *140 MB/s* on my server. Using diagnostics
I could see that my disk drives (2 commodity SATA 1TB drives in parallel)
were both saturating their write queues. At this speed I was filling up my
disks fast, so I had to start simultaneous jobs to flush these files from
temporary filesystem on the receiving server to permanent storage in my
dcache. Once the drives were interleaving reads and writes, the download
performance dropped to around 70MB/s net for both drives.
3. *third bottleneck:* fear of too much success -- to see what the next
limiting point might be, I switched to a data transfer node that the UConn
data center made available for testing. It combines a 10Gb nic connected to
a central campus switch and what Dell calls a high-performance raid (Dell
H700, 500GB, probably large fraction of this is SSD). On this system I
never saw the disks saturate their read/write queues. However the
throughput rose quickly as the transfers started, and as soon as I saw
transfers exceeding *300MB/s* I remembered Chip's warning and cancelled
the job. I then decreased the number of parallel streams (from the globus
online defaults) to limit the impact on JLab infrastructure. Using just 1
simultaneous transfer / 2 parallel streams (globus default is 2 / 4) I was
seeing a steady-state rate between 150 and *200 MB/s *average download
speed, even with simultaneous downloading and pushing from the fast raid to
my dcache (multiple parallel jobs) -- which was necessary to keep from
overflowing this 500GB partition in a matter of minutes. Decreasing the
globus options to just 1 / 1 I was able to limit the speed to *120 MB/s*
which is still enough to make me happy for now.
I know without any fiddling you were able to get somewhere between
bottlenecks 1 and 2 above. From this log of lessons learned, I suspect you
will know what steps you might take to increase your speed to the next
resistance point. One suggestion for the future: we should coordinate this.
For example, anyone who wants offsite access to the PS triggers should get
them from UConn, not fetch them again from Jlab, since we already have the
full set of them from Spring 2016 in gridftp-accessible storage at UConn.
Likewise for what you pull to IU? Perhaps we should set up a central place
where we record what GlueX data is available, where, and by what protocol.
-Richard Jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20160523/0e71236f/attachment.html>
More information about the Halld-offline
mailing list