[Halld-offline] Software Meeting Minutes, March 30, 2022

Thu Mar 31 11:58:24 EDT 2022

Folks,

Please find the minutes here 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_30,_2022#Minutes> 
and below.

   -- Mark

      _________________________________

  Minutes, GlueX Software Meeting, March 30, 2022

Present: Alex Austregesilo, Nathan Brei, Thomas Britton, Eugene 
Chudakov, Sergey Furletov, Mark Ito (chair), Igal Jaegle, Richard Jones, 
David Lawrence, Curtis Meyer, Justin Stevens, Simon Taylor, Beni Zihlmann

There is a recording of this meeting 
<https://jlab-org.zoomgov.com/rec/share/vdGXuc7qvEizUrKTSXF0VgCYpWs2-5CTYCLFHxTVfMB7yx4GsGPfMgas9HndnCOL.vvqhRHJX3xjlNhI4> 
on the ZoomGov site. (Passcode: 7itcSj^X)

      Announcements

This will be the last Software Meeting that Mark will chair. He is 
retiring from the Lab.

      Review of Minutes from the Last Software Meeting

We went over the minutes from the meeting on March 16th 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_16,_2022#Minutes>. 

  * Mark mentioned that he is seeing an increasing number of requests
    from users to be added the halld Slurm account when they are already
    members of the halld Unix group. He does not know why. The command
    line actions linked from FAQ
    <https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_FAQ#What_is_this_sbatch_error_that_I_am_getting.3F>
    for managing the Slurm account work well.

      FAQ of the Fortnight: How do I use secure shell (ssh) to access
      GitHub?

Mark led us through the FAQ 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_FAQ#How_do_I_use_secure_shell_.28ssh.29_to_access_GitHub.3F> 
that describes a method for possible password-less interaction with the 
GitHub site, including pushes, using ssh keys. Using this method became 
more attractive after GitHub removed the ability to use one's account 
password for authentication. A personal access token is now required in 
lieu of the password.

Richard remarked that support for ssh authentication may be dropped in 
the not-so-distant future.

      JANA2

Nathan described efforts to validate REST file production with JANA2 by 
comparing results with the same activity using JANA. By exploiting the 
toStrings member function, present in all JANA classes, he can create a 
text dump of produced objects in the two contexts and compare them. That 
produced a large number of diffs that are hard to interpret. To address 
that problem, he recently developed a technique where objects can be 
compared in the order that they are produced. That way the difference 
produced furthest upstream can be identified and separated from 
resulting downstream differences. In addition he has developed a new 
tool, JInspector, that allows inspection of factories during execution. 
The scheme allows invocation of JInspector from a gdb session. The next 
step is to build a system that automatically compares JANA output with 
that of JANA2, halts when differences are found, and allows interactive 
invocation of JInspector.

Richard remarked that he has developed a tool that does similar things 
in the context of insuring reproducibility of results on repeated runs 
of our code. To use the tools one instruments the code being examined 
such that intermediate results from selected points of execution (i.e., 
those instrumented) are dumped to a binary file the first time through 
execution. Subsequent execution of the same code will notice that the 
binary log exists and compare the results from the current execution to 
those obtained in the initial pass. One complication that is addressed 
is that because many discrepancies only occur in a multi-threaded 
context where the results are generally not produced in the same order 
run-to-run. This means that here has to be a search of the log, forward 
and backward, for each result as it is produced in the current execution 
of the program. When a difference is detected, the user is dropped into 
the debugger. The tool is available on GitHub 
<https://github.com/rjones30/dilog.git>.

Nathan asked Richard about how he compares ROOT histograms generated 
from his test runs. Richard wrote some personal code 
<https://drive.google.com/file/d/1eKbAZFHMkX4H7IFg6K47n9Mu3Q8NBLVd/view?usp=sharing> 
to do the job.

Alex asked Richard about where he is on the reproducibility front. 
Richard has but the effort aside temporarily after noticing that 
additional issues have cropped up in recent bouts of code development. 
Reproducibility is a major concern for Nathan when comparing JANA and 
JANA2. If JANA is not producing the same results every time, then that 
will show up as a difference with JANA2 having nothing to do with JANA2 
itself.

Richard explained that the main contributor to non-reproducibility is 
the ANALYSIS library. The code has many history-driven complications, 
complications that not only induce problems, but make debugging difficult.

Nathan has been looking only at REST production. Justin suggested that 
he look at the entire suite of plugins used in reconstruction launches 
as an acid test of result registration between JANA and JANA2.

Justin asked about at what point do we declare victory. Mark thought 
that giving up on bit-for-bit reproducibility and bit-for-bit 
registration between JANA and JANA2 would be a form of surrender. They 
underlie validation of any number of future software development; one 
must know that that things that should not be changing are in fact not 
changing. On the other hand those goals are formidable from where we sit 
now. There might be a judgment to make on how seriously this 
uncomfortable position affects physics results.

      Changing BMS_OSNAME

Mark presented a proposal for changing the value of BMS_OSNAME, the 
environment variable that we use to identify properties of the operating 
system in use. One of the main goals is to use a single build for both 
the CentOS 7 nodes on the JLab farm and the CentOS 7 container on the 
OSG and HPCs. See his slides 
<https://halldweb.jlab.org/doc-public/DocDB/ShowDocument?docid=5547> for 
the details. From the discussion:

  * There are cases where the processor architecture field (e.g.,
    x86_64) is useful. We should consider keeping it.
  * Richard exploits the feature of having different OSs with separate
    bin, lib, etc. directories in a built package, with all OSs sharing
    the same code.
  * We were not ready to make a decision on going forward with the
    proposal. More discussion is needed.

      Succession of software coordinator responsibilities

We took a quick look at the Google Doc 
<https://docs.google.com/document/d/1QF8fWjIoLHejyhXPgAzwjp6s5JLl6PM64efow1WnoBQ/edit> 
listing the responsibilities. Time was running short so Alex proposed a 
dedicated meeting of interested parties later in the week.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20220331/87db9abc/attachment.html>