[Halld-offline] added capabilities in osg run scripts

David Lawrence davidl at jlab.org
Thu Jul 20 07:07:18 EDT 2017


Hi Guys,

  In the past we have had problems with the servers getting hit really hard when lots of farm jobs
try accessing them at once. Pointing RCDB_CONNECTION to a JLab server is OK for now if there
are only a few jobs using it, but I think we are going to need a solution where the RCDB 
(or relevant parts of it) are packaged in the container itself.

Regards,
-David


> On Jul 17, 2017, at 3:58 PM, Richard Jones <richard.t.jones at uconn.edu> wrote:
> 
> Sean,
> Thanks a lot, this sounds perfect.  I presume the http_proxy changes were made in the container, since the template script wasn't updated?
> Yes, in the container.
> One more comment:  it may be useful to have a default setting for the connection to the RCDB, e.g.  RCDB_CONNECTION=mysql://rcdb@hallddb.jlab.org/rcdb <mailto://rcdb@hallddb.jlab.org/rcdb>
> Yes, I agree. I have added this. It might take another 30 minutes before the update appears in /cvmfs.
> 
> -Richard J.
> 
> On Mon, Jul 17, 2017 at 3:23 PM, Sean Dobbs <s-dobbs at northwestern.edu <mailto:s-dobbs at northwestern.edu>> wrote:
> Richard,
> 
> Thanks a lot, this sounds perfect.  I presume the http_proxy changes were made in the container, since the template script wasn't updated?
> One more comment:  it may be useful to have a default setting for the connection to the RCDB, e.g.  RCDB_CONNECTION=mysql://rcdb@hallddb.jlab.org/rcdb <mailto://rcdb@hallddb.jlab.org/rcdb>
> 
> ---Sean
> 
> 
> On Mon, Jul 17, 2017 at 1:57 PM Richard Jones <richard.t.jones at uconn.edu <mailto:richard.t.jones at uconn.edu>> wrote:
> Sean,
> 
> [ability to download the latest ccdb.sqlite file from halldweb.jlab.org/dist <https://urldefense.proofpoint.com/v2/url?u=http-3A__halldweb.jlab.org_dist&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=yDkkICgQySpvpZthngnDLMXHfbHiYtlR9GwTTeg0H2U&s=EXwR_dhU_P5byRiLiITjm2p7KAbl57HJimO5BnoaUpw&e=> and use it, instead of copying from a standard location on cvmfs, where it is likely to be stale] This is helpful, but is there no good way to automatically update the SQLite file on OASIS (perhaps weekly)?  I worry that when we do a large scale production, having every job copying from halldweb would place too much of a load on the server, so that we would need to do some server-side caching/load balancing.
> For things that are expected to be updated on a regular schedule, osg users should use the standard OSG_SQUID_LOCATION variable so that multiple jobs running on a site share a single copy. The cvmfs cache is per-host, whereas the squid cache is per-site, so this is much more efficient in terms of occupation of the site cache space for something like the sqlite file. I had forgotten to translate the OSG_SQUID_LOCATION variable into environment variables http_proxy, https_proxy, and this was why you were seeing repeated hits on the jlab server for each job. This should be fixed now. The site squid cache will be invalidated on each site once per day, each time the sqlite file is updated, so there should be a max of one download from halldweb.jlab.org <https://urldefense.proofpoint.com/v2/url?u=http-3A__halldweb.jlab.org&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=yDkkICgQySpvpZthngnDLMXHfbHiYtlR9GwTTeg0H2U&s=hhaIasIMSKCtlNwozKQWuSJ86DpR-UcS05HgpeE3PFY&e=> per site per day. Is this going to be ok?
> [ability to use non-sequential run lists for simulating a series of runs, instead of assuming that you want to fill in a sequential range with every run number; if you supply a plain text file called runsequence.txt with your job then this file will be used to look up the run sequence instead of assuming a contiguous sequence.] Could you please add the attached file into some standard location?  I think it would be useful for users to have standard run lists stored somewhere in the CVMFS namespace.
> Ok, this is done. The file you shared with me has been uploaded to /cvmfs/oasis.opensciencegrid.org/gluex/templates <https://urldefense.proofpoint.com/v2/url?u=http-3A__oasis.opensciencegrid.org_gluex_templates&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=yDkkICgQySpvpZthngnDLMXHfbHiYtlR9GwTTeg0H2U&s=IDqQxjmiWD_xUBymCJtf1vC2TdntFY8GYoGdovAtCms&e=>
> [You also asked if we could have more than one build of sim-recon available simultaneously on the oasis filesystem. That is already a feature of the system I set up, so no updates were required to provide this.] That great to hear.  When do you think you can provide a copy of sim-recon branch recon-2017_01-ver01-batch01-mcsmear?
> This is available now. Just do "source $GLUEX_TOP/.hdpm/env/recon-2017_01-ver01-batch01-mcsmear.sh" at the beginning of the job script, and the binaries and libraries in your path variables will come from the branch build.
> 
> -Richard Jones
> 
> 
> On Sun, Jul 16, 2017 at 7:13 PM, Sean Dobbs <s-dobbs at northwestern.edu <mailto:s-dobbs at northwestern.edu>> wrote:
> Richard,
> 
> Thanks for adding these features.  I have some comments below:
> 
> On Sat, Jul 15, 2017 at 6:00 AM Richard Jones <richard.t.jones at uconn.edu <mailto:richard.t.jones at uconn.edu>> wrote:
> 
> ability to download the latest ccdb.sqlite file from halldweb.jlab.org/dist <https://urldefense.proofpoint.com/v2/url?u=http-3A__halldweb.jlab.org_dist&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=yDkkICgQySpvpZthngnDLMXHfbHiYtlR9GwTTeg0H2U&s=EXwR_dhU_P5byRiLiITjm2p7KAbl57HJimO5BnoaUpw&e=> and use it, instead of copying from a standard location on cvmfs, where it is likely to be stale.
> 
> This is helpful, but is there no good way to automatically update the SQLite file on OASIS (perhaps weekly)?  I worry that when we do a large scale production, having every job copying from halldweb would place too much of a load on the server, so that we would need to do some server-side caching/load balancing.
> 
> For what it's worth, in some ~50 test job runs during the week, I noticed that sometimes wget'ing the SQLite file took ~30 minutes (though apparently it's faster during the weekend!)
>  
> ability to use non-sequential run lists for simulating a series of runs, instead of assuming that you want to fill in a sequential range with every run number; if you supply a plain text file called runsequence.txt with your job then this file will be used to look up the run sequence instead of assuming a contiguous sequence.
> 
> Could you please add the attached file into some standard location?  I think it would be useful for users to have standard run lists stored somewhere in the CVMFS namespace.
>  
> You also asked if we could have more than one build of sim-recon available simultaneously on the oasis filesystem. That is already a feature of the system I set up, so no updates were required to provide this.
> 
> 
> That great to hear.  When do you think you can provide a copy of sim-recon branch recon-2017_01-ver01-batch01-mcsmear?
> 
> 
> ----Sean
> 
> 
>  
> -Richard Jones
> 
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170720/4bf680bb/attachment-0002.html>


More information about the Halld-offline mailing list