[Halld-offline] request for university compute resources

Richard Jones richard.t.jones at uconn.edu
Mon May 28 07:26:22 EDT 2018


Dear Gluexers,

As we begin to ramp up Monte Carlo production for the experiment on the
OSG, I began to think about how we can increase the resources we have for
these jobs, and it occurs to me that a whole lot of us have access to local
shared cluster resources for research. We do not own these clusters for the
most part, but we have accounts and can get a share of them when we need
them. Here at UConn, I had accounts on two such research clusters, one
the Storrs
HPC UConn cluster <https://urldefense.proofpoint.com/v2/url?u=https-3A__hpc.uconn.edu_&d=DwIBaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=oyKV5joTkJsuRYv6hh48IMTw3i-IrYD-ZUAHHU0DdAY&m=ktFWEYz4WLEB_ET9yYyUhFC7JCeB1_C7P3GBQL6dXEw&s=jTxWkhvvvLghdvhZUCyycZQsQYH_HZHs5GoLVmNntQM&e= > and the other the UConn Health
HPCF <https://urldefense.proofpoint.com/v2/url?u=https-3A__health.uconn.edu_high-2Dperformance-2Dcomputing_&d=DwIBaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=oyKV5joTkJsuRYv6hh48IMTw3i-IrYD-ZUAHHU0DdAY&m=ktFWEYz4WLEB_ET9yYyUhFC7JCeB1_C7P3GBQL6dXEw&s=Jd6O6jicqg-6K5K-dMNFeZoig-XbogQNPkjZxi9w_t4&e= >. So over the
past few days I put together a software channel [bosco] that can funnel
jobs that are waiting on our Gluex osg queues into these third-party
resources. The task of getting our oasis and singularity images visible on
those resources so the jobs think they are running on the osg is taken care
of by the framework I set up. As of this weekend, we now have Gluex jobs
running on both of these UConn shared resources, currently 500 cores
between them are running continuously for Gluex.

Now think about what might be possible if we could get 5-6 other
universities with similar shared resources to pitch in at a similar level?
Getting 500 cores as a normal user from clusters scattered around your
university might be possible for 5-6 of our institutions that I can think
of. That would mean increasing GlueX osg resources by a large factor WITH
ALMOST ZERO effort from you. All you need to do is

   1. Request account(s) from the managers of these resources
   2. As soon as you have your userid/pw that gives you access to the
   submit node, call me up on the phone.
   3. I give you a login on my bosco submit node [bosco is OSG lingo for
   the linkage between the OSG infrastructure and campus clusters] and you run
   the command "bosco_cluster --add" and give your account and login
   information. The bosco system then sets up a ssh key in a secure location
   for accessing this account on your behalf and submitting jobs, fetching
   back the results, etc.
   4. You tell me the rules for what jobs can go to this cluster, and I set
   those rules in the bosco router.
   5. I keep a record of all work done on behalf of Gluex by your resource,
   and you can report back to the keepers the list of publications that come
   out as our sims+analysis advances.

You can look up bosco on the web. It supports local campus clusters running
condor, pbs, slurm, lsf, and maybe sge, not sure if the sge support is
current.

Right now we are getting a good allocation from OSG opportunistic cycles
(see the attached image for this weekend). But eventually we are going to
get knocked down as our usage accumulates and our jobs start getting pushed
to the back of the line. If this sounds interesting to you, please let me
know.

-Richard Jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20180528/ee126c51/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2018-05-28 at 7.23.09 AM.png
Type: image/png
Size: 255870 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20180528/ee126c51/attachment-0001.png>


More information about the Halld-offline mailing list