[Halld-offline] Software Meeting Minutes, August 18, 2020
Mark Ito
marki at jlab.org
Tue Aug 18 18:52:12 EDT 2020
Folks,
Please find the minutes here
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_August_18,_2020#Minutes>
and below.
-- Mark
GlueX Software Meeting, August 18, 2020, Minutes
Present: Sean Dobbs, Mark Ito (chair), Igal Jaegle, Naomi Jarvis, Justin
Stevens, Nilanga Wickramaarachchi, Beni Zihlmann
There is a recording of his meeting
<https://bluejeans.com/s/kAV6kSo3kEe/> on the BlueJeans site. Use your
JLab credentials to authenticate.
Announcements
1. Draft of DSelector documentation
<https://mailman.jlab.org/pipermail/halld-offline/2020-July/008283.html>
Sean noted that the latest version of the document is on Overleaf.
Naomi is the owner. Please contact her if you would like
review/contribute the to document.
* Naomi, Justin, and Beni noted that JLab now has a license of
some sort. Mark will contact David Lawrence about the details.
[Added in press: Mark forwarded the response he got from David
to the aforementioned folks.]
* Naomi reported on a issue with "Zombie ProofLite servers". Her
report:
o We were having a problem with zombie proofserv.exe processes
left behind on our cluster nodes after the jobs completed,
they were not cleaned up by slurm. Every so often, one of
many worker threads would fail to initialize properly, as
seen in the note on how many threads had gone parallel. The
job output was unaffected, although presumably it took
fractionally longer than it should have done, but after
hundreds of DSelector jobs had been run, there were a large
number of zombie processes left behind.
o It turned out to be sort of a network issue, which caused
TProof to timeout (after only 5 seconds) on some of the
forked threads. Since it timed out, it skipped that thread
and did not destroy it when ROOT exited. Setting a longer
timeout solved the problem.
o This can be done by appending the following line to
${HOME}/.rootrc: "ProofLite.StartupTimeOut 1800", which
gives a very generous timeout allowance of 30 minutes.
2. New version set: 4.24.1
<https://mailman.jlab.org/pipermail/halld-offline/2020-August/008293.html>
Mark pointed out
1. This version set has the Python-3 build changes incorporated.
2. A new container was created to track the new RPM required for
HDGeant4 (tirpc-devel).
3. The patch version set solves the "hang on first event" problem
of HDGeant4.
Review of Minutes from the Last Software Meeting
We went over the minutes from the August 4
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_August_4,_2020#Minutes>.
* Since the last meeting, no one has noticed the slow wiki access on
halldweb.jlab.org that we discussed. The only known change was the
doubling of the period between refreshes of the MCwrapper web page
that Thomas Britton reported last time.
* The dE/dx theta correction for the CDC seems happy in its new home,
merged onto the master branch of halld_recon.
* The Python 3 compatible build that Mark described was not as
universal as he had hoped. It continues to work for both CentOS 7
and Fedora 32, but Ubuntu 20 and CentOS 8 did not work without
modifications to the scheme. The problem is the way different
distributions treat the interpretations (or lack thereof) of Python,
Python 2, Python 3, and of Scons, SCons 2, Scons 3. Turns out that
each distribution is a special case. This has delayed deployment of
a CentOS 8 container.
* Naomi reported a new feature that Theo Larrieu added to the
electronic logbook that allows selection of multiple images to be
uploaded at one time. This has worked for her. Mark will check to
see if Mark Dalton has used the feature.
Report from the Last HDGeant4 Meeting
We went over the minutes from the meeting on August 11
<https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_August_11,_2020#Minutes>
without significant comment.
Report from SciComp Meeting], August 6
Mark showed a slide
<https://docs.google.com/presentation/d/19TLCf974jsYtdRElfub4QMEFOu8DRbOXkP8kxrljeq8/edit?usp=sharing>
summarizing items from the meeting. The big message is that Scientific
Computing is moving to substantial support of the OSG at JLab, including
contributing compute resources. Details of the plan are not known at
present, but they are working on getting all of the component pieces
working.
Restoration of Execution Tests for Pull Request Builds
Sean described bringing back execution of binaries
<https://mailman.jlab.org/pipermail/halld-offline/2020-July/008272.html>,
e.g., hd_root, as part of the automatic pull request test procedure.
This has been broken for many moons. In the process the Mark improved
the environment set-up procedure. We need to do similar tests for
hdgeant4 next. Sean will look into it.
Review of recent issues and pull requests
Sean discussed halld_recon Pull Request #432
<https://github.com/JeffersonLab/halld_recon/pull/432>: Suppress
geometry-related warning messages". At present, merging is waiting on a
new version of JANA to appear in a future version set. This is on Mark's
list.
Action Item Review
1. Ask David about JLab's Overleaf license (Mark, done)
2. Create an FAQ on the zombie prooflite solution (Mark)
3. Ask Mark D. about multiple file uploads with the electronic logbook
(Mark)
4. HDGeant4 run-time testing for pull requests (Sean)
5. Create a version set with a new version of JANA sourced from GitHub
(Mark)
Retrieved from
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Software_Meeting,_August_18,_2020&oldid=101216"
* This page was last modified on 18 August 2020, at 18:50.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20200818/bd0b16c9/attachment-0001.html>
More information about the Halld-offline
mailing list