[Halld-physics] NERSC + monitoring launch 2018-01 ver17
Alexander Austregesilo
aaustreg at jlab.org
Sun Sep 9 16:51:51 EDT 2018
Dear Colleagues,
The post-processing for the monitoring launch over the 2018-01 data is
complete.
The usual monitoring plots can be found on the web pages:
https://halldweb.jlab.org/data_monitoring/Plot_Browser.html
https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/runBrowser.py
https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/versionBrowser.py
The merged histogram files can be found on the work disk at:
/work/halld/data_monitoring/RunPeriod-2018-01/mon_ver17/rootfiles/
The REST files and several skims can be accessed here:
/cache/halld/offline_monitoring/RunPeriod-2018-01/ver17/REST/
Best regards,
Alex
On 9/6/2018 1:51 PM, David Lawrence wrote:
> Hi All,
>
> This is a bit of a delayed announcement, but I’m declaring the
> Spring 2018 monitoring pass
> ver17 basically done. Swif2 reports 5462 of the 5495 jobs completed
> successfully. I was able
> to resubmit some jobs and have them succeed on subsequent tries, but
> the last 33 seem to
> have fatal errors that will need to be explored more.
>
> Some points of interest:
>
> - This was the first large scale test of processing raw data offsite
> at NERSC
>
> - This launch included the first 10 files of all runs with
> @is_2018production and @status_approved
>
> - It took approximately 12 days to run through them after submitting
> them all to swif2 on Aug. 21st
>
> - The rate of processing them was not stable. See red curve on top
> plot here:
> https://halldweb.jlab.org/data_monitoring/launch_analysis/2018_01/launches/offline_monitoring_RunPeriod2018_01_ver17/)
> This was partly due to multiple stages and ripple effects between them:
> tape->cache, cache->globus->NERSC, NERSC->slurm and then all of that
> in reverse.
>
> - The default mechanism for submitting jobs resulted in only about 300
> jobs running simultaneously.
> Compare this to the 600-700 we have seen on the JLab farm.
> We will be exploring using a reservation for the next NERSC launch to
> see if we can
> improve that.
>
> - We noticed only about 2Gbps steady rate copying to NERSC with rare
> bursts >=7Gbps.
> SciComp and Networking folks diagnosed this and have worked to improve
> it. The biggest
> gain is expected to come when they reboot the external router later
> this month so that it will
> support jumbo frames.
>
> - Output can be found in
> /mss/halld/offline_monitoring/RunPeriod-2018-01/ver17 or
> the associated cache directory.
>
> Regards,
> -David
>
> -------------------------------------------------------------
> David Lawrence Ph.D.
> Staff Scientist, Thomas Jefferson National Accelerator Facility
> Newport News, VA
> davidl at jlab.org <mailto:davidl at jlab.org>
> (757) 269-5567 W
> (757) 746-6697 C
>
>
>
>
> _______________________________________________
> GlueX-Collaboration mailing list
> GlueX-Collaboration at jlab.org
> https://mailman.jlab.org/mailman/listinfo/gluex-collaboration
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-physics/attachments/20180909/6ce368aa/attachment.html>
More information about the Halld-physics
mailing list