[Halld-offline] Software Meeting Minutes, March 30, 2022
Mark Ito
marki at jlab.org
Thu Mar 31 11:58:24 EDT 2022
Folks,
Please find the minutes here
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_30,_2022#Minutes>
and below.
-- Mark
_________________________________
Minutes, GlueX Software Meeting, March 30, 2022
Present: Alex Austregesilo, Nathan Brei, Thomas Britton, Eugene
Chudakov, Sergey Furletov, Mark Ito (chair), Igal Jaegle, Richard Jones,
David Lawrence, Curtis Meyer, Justin Stevens, Simon Taylor, Beni Zihlmann
There is a recording of this meeting
<https://jlab-org.zoomgov.com/rec/share/vdGXuc7qvEizUrKTSXF0VgCYpWs2-5CTYCLFHxTVfMB7yx4GsGPfMgas9HndnCOL.vvqhRHJX3xjlNhI4>
on the ZoomGov site. (Passcode: 7itcSj^X)
Announcements
This will be the last Software Meeting that Mark will chair. He is
retiring from the Lab.
Review of Minutes from the Last Software Meeting
We went over the minutes from the meeting on March 16th
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_16,_2022#Minutes>.
* Mark mentioned that he is seeing an increasing number of requests
from users to be added the halld Slurm account when they are already
members of the halld Unix group. He does not know why. The command
line actions linked from FAQ
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_FAQ#What_is_this_sbatch_error_that_I_am_getting.3F>
for managing the Slurm account work well.
FAQ of the Fortnight: How do I use secure shell (ssh) to access
GitHub?
Mark led us through the FAQ
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_FAQ#How_do_I_use_secure_shell_.28ssh.29_to_access_GitHub.3F>
that describes a method for possible password-less interaction with the
GitHub site, including pushes, using ssh keys. Using this method became
more attractive after GitHub removed the ability to use one's account
password for authentication. A personal access token is now required in
lieu of the password.
Richard remarked that support for ssh authentication may be dropped in
the not-so-distant future.
JANA2
Nathan described efforts to validate REST file production with JANA2 by
comparing results with the same activity using JANA. By exploiting the
toStrings member function, present in all JANA classes, he can create a
text dump of produced objects in the two contexts and compare them. That
produced a large number of diffs that are hard to interpret. To address
that problem, he recently developed a technique where objects can be
compared in the order that they are produced. That way the difference
produced furthest upstream can be identified and separated from
resulting downstream differences. In addition he has developed a new
tool, JInspector, that allows inspection of factories during execution.
The scheme allows invocation of JInspector from a gdb session. The next
step is to build a system that automatically compares JANA output with
that of JANA2, halts when differences are found, and allows interactive
invocation of JInspector.
Richard remarked that he has developed a tool that does similar things
in the context of insuring reproducibility of results on repeated runs
of our code. To use the tools one instruments the code being examined
such that intermediate results from selected points of execution (i.e.,
those instrumented) are dumped to a binary file the first time through
execution. Subsequent execution of the same code will notice that the
binary log exists and compare the results from the current execution to
those obtained in the initial pass. One complication that is addressed
is that because many discrepancies only occur in a multi-threaded
context where the results are generally not produced in the same order
run-to-run. This means that here has to be a search of the log, forward
and backward, for each result as it is produced in the current execution
of the program. When a difference is detected, the user is dropped into
the debugger. The tool is available on GitHub
<https://github.com/rjones30/dilog.git>.
Nathan asked Richard about how he compares ROOT histograms generated
from his test runs. Richard wrote some personal code
<https://drive.google.com/file/d/1eKbAZFHMkX4H7IFg6K47n9Mu3Q8NBLVd/view?usp=sharing>
to do the job.
Alex asked Richard about where he is on the reproducibility front.
Richard has but the effort aside temporarily after noticing that
additional issues have cropped up in recent bouts of code development.
Reproducibility is a major concern for Nathan when comparing JANA and
JANA2. If JANA is not producing the same results every time, then that
will show up as a difference with JANA2 having nothing to do with JANA2
itself.
Richard explained that the main contributor to non-reproducibility is
the ANALYSIS library. The code has many history-driven complications,
complications that not only induce problems, but make debugging difficult.
Nathan has been looking only at REST production. Justin suggested that
he look at the entire suite of plugins used in reconstruction launches
as an acid test of result registration between JANA and JANA2.
Justin asked about at what point do we declare victory. Mark thought
that giving up on bit-for-bit reproducibility and bit-for-bit
registration between JANA and JANA2 would be a form of surrender. They
underlie validation of any number of future software development; one
must know that that things that should not be changing are in fact not
changing. On the other hand those goals are formidable from where we sit
now. There might be a judgment to make on how seriously this
uncomfortable position affects physics results.
Changing BMS_OSNAME
Mark presented a proposal for changing the value of BMS_OSNAME, the
environment variable that we use to identify properties of the operating
system in use. One of the main goals is to use a single build for both
the CentOS 7 nodes on the JLab farm and the CentOS 7 container on the
OSG and HPCs. See his slides
<https://halldweb.jlab.org/doc-public/DocDB/ShowDocument?docid=5547> for
the details. From the discussion:
* There are cases where the processor architecture field (e.g.,
x86_64) is useful. We should consider keeping it.
* Richard exploits the feature of having different OSs with separate
bin, lib, etc. directories in a built package, with all OSs sharing
the same code.
* We were not ready to make a decision on going forward with the
proposal. More discussion is needed.
Succession of software coordinator responsibilities
We took a quick look at the Google Doc
<https://docs.google.com/document/d/1QF8fWjIoLHejyhXPgAzwjp6s5JLl6PM64efow1WnoBQ/edit>
listing the responsibilities. Time was running short so Alex proposed a
dedicated meeting of interested parties later in the week.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20220331/87db9abc/attachment.html>
More information about the Halld-offline
mailing list