[Halld-offline] Offline Software Meeting Minutes, September 18, 2018
Mark Ito
marki at jlab.org
Wed Sep 19 17:01:57 EDT 2018
Colleagues,
Please find the minutes below and here
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_18,_2018>.
-- Mark
__________________________________
Minutes, GlueX Offline Meeting, September 18, 2018
Present:
* *CMU: * Naomi Jarvis, Curtis Meyer
* *FSU: * Sean Dobbs
* *JLab: * Alex Austregesilo, Thomas Britton, Mark Ito (chair), Justin
Stevens, Beni Zihlmann
* *MIT: * Cristiano Fanelli
* *Raleigh, NC: * David Lawrence
* *Yerevan: * Hrach Marukyan
There is a recording of this meeting <https://bluejeans.com/s/GoPvv/> on
the BlueJeans site. Use your JLab credentials to access it.
Announcements
1. New releases: AmpTools 0.9.4, halld_sim 3.4.0, MCwrapper 2.0.1
<https://mailman.jlab.org/pipermail/halld-offline/2018-September/003355.html>.
The linked email has links to the release notes for each package.
* AmpTools 0.9.4: Support for weighted events
* halld_sim 3.4.0: Gets parameters for drift time-to-distance
relationship from the CCDB.
* MCwrapper 2.0.1: Uses generator binary names consistent with
their containing directories.
o This version has a problem with the bggen bash script. See
Thomas if you are having difficulties.
2. New version of build_scripts: version 1.39
<https://github.com/JeffersonLab/build_scripts/releases/tag/1.39>:
Fixes build of cernlib to find lapack and blas libraries and
successfully create the PAW binary.
3. Status of Launches.
* Alex reports that the reconstruction launch over 2017 data is
100% complete. It took 30 days to complete. There was no tape
I/O bottleneck this time. Up to 1000 jobs were running on the
farm simultaneously. Extrapolating, the Spring 2018 data should
take 90 days to reconstruct if resources are similar.
* A quick analysis launch was done for Lubomir.
* A complete analysis launch with 60 reactions
<https://halldweb.jlab.org/wiki-private/index.php/Spring_2017_Analysis_Launch#Version19>
was also done on the 2017 data[?]. It took only 20 hours to
complete
<https://halldweb.jlab.org/data_monitoring/analysis/summary_swif_output_analysis_2017-01_ver19_batch01.html>.
* A problem came up during the merging of the output. The merge
jobs seem to place a high load on the work disk server, so much
that other users are pushed out and do not get decent service.
This same activity did not cause any problems with the Lustre
disks. Alex is working with Dave Rackley of SciComp to track
down the problem.
Review of minutes from the September 4 meeting
We went over the minutes
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_4,_2018#Minutes>.
* There are still issues with two pull requests mentioned last time:
1. Fix event order dependence, #8
<https://github.com/JeffersonLab/halld_recon/pull/8>
2. Warn if JANA_CALIB_CONTEXT not set, #9
<https://github.com/JeffersonLab/halld_recon/pull/9>
* Mark encouraged us, once again, to implement the new gxenv and
gxclean commands
<https://github.com/JeffersonLab/build_scripts/wiki/gluex_env_boot_jlab.%28c%29sh>
when setting up our shells.
* Thomas and Maria Patsyuk have resolved her problem with the particle
gun with appropriate setting of TGTWIDTH in control.in.
Review of the HDGeant4 Meeting from September 11
We went over the minutes
<https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_September_11,_2018#Minutes>.
On the problem with forward protons at high momentum (Issue #66
<https://github.com/JeffersonLab/HDGeant4/issues/66>), Beni reports that
we sees hits from the downstream FDC chambers missing on tracks both in
HDG3 and HDG4 at the wire-based stage, but in the HDG3 case those hits
are recovered during time-based tracking. In HDG4, the hits remain lost.
He is investigating why this is the case.
Thomas reminded us that the simulations we are studying were done with
all physics processes turned off.
Beni also occasionally sees events where there are no FDC hits
altogether in HDG4[?], neither simulated nor "truth", when scanning
events with hdview2. Justin reported seeing similar pseudo-point
multiplicity distributions when comparing 3 vs. 4. These observations
seem to be in tension with each other.
Software Items on the Collaboration Meeting Agenda
We looked at the agenda
<https://halldweb.jlab.org/wiki/index.php/GlueX-Collaboration-Sep-2018#Saturday_September_29.2C_2018_.28CC-L102.29>
put together by Sean. Looks good!
NERSC Update
David gave us an update. For plots see the recording starting at 51:00.
Since his last report, David completed the monitoring launch at NERSC.
He ran 5000 jobs and it took 12 days.
Since then he did a bandwidth test where he used Globus Online to
transfer 3 complete runs, over 200 files per run, out to NERSC, outside
of the SWIF2 framework. He saw transfer speeds matching the advertised
bandwidth of the Lab's link, with peaks at 10 Gb/s (and sometimes a bit
more).
After the files were transferred, jobs were submitted against them
(about 700 jobs). These took over 5 days to complete and there where
slack period where no jobs were running for up to a day.
Computing and Software Review
Mark reviewed the materials we have received so far on the review to be
held November 27 and 28.
* Email from Rolf
<https://halldweb.jlab.org/talks/2018/rolf_computing_review.pdf>
* Charge
<https://halldweb.jlab.org/talks/2018/12_GeV_Experimental_Computing_Review_Committee_Charge_v0-7.docx>
* Agenda
<https://halldweb.jlab.org/talks/2018/12GeV_Software_Review_2018_agenda_v1.docx>
Review of recent pull requests
* halld_recon
<https://github.com/JeffersonLab/halld_recon/pulls?q=is%3Aclosed+is%3Apr>
* halld_sim
<https://github.com/JeffersonLab/halld_sim/pulls?q=is%3Aclosed+is%3Apr>
Alex mentioned that he has seen crashes with the latest version of
halld_recon when he tries to reconstruct a full run. Many of the jobs
crash.
Review of recent discussion on the GlueX Software Help List
We went over the list
<https://groups.google.com/forum/#%21forum/gluex-software> without a lot
of comment.
Action Item Review
1. Fix the work disk.
2. Find the reason for track loss in HDG4.
* Find the timing problem.
* Understand the events with no FDC hits.
3. Release a new version of gluex_MCwrapper with the bggen bash script
fixed.
4. Find out the schedule for getting more cache disk from SciComp.
Retrieved from
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_September_18,_2018&oldid=89069"
* This page was last modified on 19 September 2018, at 16:58.
--
Mark Ito, marki at jlab.org, (757)269-5295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20180919/eba2d476/attachment-0001.html>
More information about the Halld-offline
mailing list