[Halld-offline] Software Meeting Minutes, April 28, 2021

Thu Apr 29 14:36:01 EDT 2021

Folks,

Please find the minutes here 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_April_27,_2021#Minutes> 
and below.

   -- Mark

      ______________________________________________

    GlueX Software Meeting, April 27, 2021, Minutes

Present: Alexander Austregesilo, Edmundo Barriga, Thomas Britton, Sean 
Dobbs, Mark Ito (chair), Igal Jaegle, Naomi Jarvis, Simon Taylor, 
Nilanga Wickramaarachchi, Jon Zarling, Beni Zihlmann

There is a recording of this meeting 
<https://bluejeans.com/s/kxr2tk1kaQB/>. Log into the BlueJeans site 
<https://jlab.bluejeans.com> first to gain access (use your JLab 
credentials).

      Announcements

 1. version set 4.37.1 and mcwrapper v2.5.2
    <https://mailman.jlab.org/pipermail/halld-offline/2021-April/008511.html>.
    Thomas described the changes in the latest version of MCwrapper.
    Luminosity is now used to normalize the number of events to produce
    for each run number requested.
 2. New: HOWTO copy a file from the ifarm to home
    <https://halldweb.jlab.org/wiki/index.php/HOWTO_copy_a_file_from_the_ifarm_to_home>.
    Mark pointed us to the new HOWTO. Sean told us that one could do the
    same thing from ftp.jlab.org without having to set up an ssh tunnel.
    Mark will make the appropriate adjustments to the documentation.
 3. To come: HOWTO use AmpTools on the JLab farm GPUs
    <https://halldweb.jlab.org/wiki/index.php/HOWTO_use_AmpTools_on_the_JLab_farm_GPUs>.
    Alex described his HOWTO (still under construction).
 4. work disk full again
    <https://mailman.jlab.org/pipermail/halld-offline/2021-April/008517.html>.
    Mark described the current work disk crunch, including plots of
    recent usage history
    <https://halldweb.jlab.org/doc-public/DocDB/ShowDocument?docid=5082>.
    More clean-up will be needed until the arrival of new work disk
    servers this summer.
 5. RedHat-6-era builds on group disk slated for deletion
    <https://mailman.jlab.org/pipermail/halld-offline/2021-April/008523.html>.
    Mark reminded us that the deletion of these builds has been carried out.
 6. Bug fix release of halld_recon: restore the REST version number
    <https://mailman.jlab.org/pipermail/halld-offline/2021-April/008516.html>.
    Mark reviewed the reason for the new version sets.

      Review of Minutes from the Last Software Meeting

We went over the minutes from the meeting on March 30th 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_30,_2021#Minutes>. 

  * It turns out that there is no pull-request-triggered test for
    HDGeant4. Mark has volunteered to set one up à la the method Sean
    instituted for halld_recon and halld_sim.
  * Some significant progress has been made on releasing CCDB 2.0.
      o The unit tests for CCDB 1.0 have been broken for some time. Mark
        and Dmitry Romanov found and fixed a problem with the fetch of
        constants in the form map<string, string> having to do with
        cache access. This problem is likely in the CCDB 2.0 branch.
      o Dmitry has started on reviving the MySQL interface for CCDB 2.0.
      o Dmitry has moved us to a new workflow for CCDB pull requests.
          + Developers will fork the JeffersonLab/ccdb repository to
            their personal accounts and work on branches created there
            as they see fit.
          + When a change is ready, they will submit a pull request back
            to the JeffersonLab/ccdb repository for merging.
          + This workflow is common outside Hall D. For example Hall C
            uses it as do many groups outside the Lab. We may consider
            using it within Hall D as well. It makes it easier to put up
            safeguards against spurious errors from inadvertent/faulty
            commits and any code review mechanism we may want to have.
            And it solves the problem of the confusing proliferation of
            branches in the main repository that we have seen. We could
            move to it with no structural changes to the repositories
            themselves.
          + Sean pointed out that such a workflow might require minor
            changes to the automatic-pull-request-triggered tests.

      Minutes from the Last HDGeant4 Meeting

We went over the minutes from the HDGeant4 meeting on April 6th 
<https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_April_6,_2021#Minutes>. 
Sean noted that the overall focus of the HDGeant4 group is to compare 
Monte Carlo with data and, using the two simulation engines at our 
disposal, G3 and G4, try to drill down to see where difference arise at 
a basic physical level in HDGeant4, and then adjust the model to get 
agreement with data. This approach is preferred over one where empirical 
correction factors are imposed as an after-burner on the simulation.

      Report from the April 20th SciComp Meeting

Mark presented slides 
<https://halldweb.jlab.org/doc-public/DocDB/ShowDocument?docid=5081>, 
the first two reproducing the Bryan Hess's agenda for the meeting and 
the third summarizing some of the discussion. Please see his slides for 
the details.

Sean asked if we could prioritize recovery of certain files over others. 
Mark will ask.

        Handling of Recon Launch Output from Off-site

Alex raised the issue of disk use when bringing results of 
reconstruction launches, performed off-site, back to JLab. All data land 
on volatile, and after reprocessing, get written to cache and from there 
to tape. He is worried about this procedure for two reasons:

 1. Data on volatile is subject to deletion (oldest files get deleted
    first) and we do not want to lose launch output to the disk cleaner.
 2. The array of problems we have always seen with Lustre disks. Both
    volatile and cache are Lustre systems.

Mark showed a plot where the amount of data we have on volatile has been 
well under the deletion level for months now. His claim was that 
pre-mature deletion from volatile has not been a problem for quite a 
while. Alex did not think that the graph was accurate; it showed too 
little variation in usage level when Alex knows that there has been 
significant activity on the disk, an argument that Mark found 
convincing. Mark will have to check on the source of his data. That 
aside, disk usage in the context should be reviewed.

        Consolidation of Skim Files on to Fewer Tapes

Sean has noticed that at times reprocessing skimmed data can take a long 
time due to retrieval times of files from tape. He suspects that this is 
because the files are scattered on many tapes and so a large number of 
tape mounts and file skips are needed to get all of the data. He 
proposed a project where, for certain skims, we re-write the data on to 
a smaller number of tapes.

Mark had some comments:

  * We should only start such a project on skims for which there is some
    reasonable expectation that retrieval will be done repeatedly in the
    future. The consolidation step itself involves reading and writing
    all of the files of interest and so reading those files has to
    happen at least a couple of times after consolidation before the
    exercise shows a net gain.
  * The way we write data to tape, by putting skim files on the
    write-through cache over several weeks guarantees that the files
    will be scattered on different tapes. With the write through cache
    we would do better to buffer data on disk until a significant
    fraction of one tape has been accumulated and then manually trigger
    the write to tape.
  * It is possible to set-up tape "volume sets" (a set of specific
    physical tapes) in advance in the tape library and then directed
    selected data types to specific volume sets. The tapes in the volume
    sets will then be dense in the data types so directed. This is
    already done for raw data but there is not structural impediment to
    doing it for other types of data. This approach has the advantage
    there there is no need to develop software to make it happen.

Something does have to be done on this front. Sean and Mark will discuss 
the issue further.

      ROOTWriter and DSelector Updates

Jon presented a list of ideas and improvements for our data analysis 
software. See his wiki page 
<https://halldweb.jlab.org/wiki-private/index.php/ROOTWriter_and_DSelectorUpdates2021>] 
for the complete list.

The items and subsequent discussion were in two broad classes:

  * How we use the ROOT toolkit: Are there more efficient practices? Are
    there features we don't exploit but should?
  * How we analyze the data: Are there new features in the Analysis
    Library that we should develop? Should the contents of the REST
    format be expanded? Are there things we do in Analysis that should
    be done in reconstruction or vice-versa?

One thing that came up was our use of TLorentzVector. Jon has seen 
others use a smaller (member-data-wise) class. Alex pointed out that the 
current ROOT documentation has marked this class as deprecated. Yet our 
use of TLorentzVector is ubiquitous. Several expressed interest in 
looking into this more closely.

Jon encouraged us to think about where we might want to expend effort. 
This will likely come up again at a future meeting.

      Production of Missing Random Trigger Files

Sean reported that he and Peter Pauli are very close to filling in all 
of the gaps in the random trigger file coverage for Fall 2018. Peter may 
give a presentation on this work at a future meeting.

      Action Item Review

 1. Set up pull-request-triggered tests for HDGeant4. (Mark)
 2. Modify the documentation to feature ftp.jlab.org. (Mark)
 3. Prioritizing specific tapes to be recovered. (Mark)
 4. Review disk usage when re-repatriating recon launch data. (Alex, Mark)
 5. Check input data for volatile usage plot. (Mark)
 6. Make a plan for structuring tape writing for efficient file
    retrieval. (Sean, Mark)
 7. Look into how we use TLorentzVector (Alex, Simon, Jon)
 8. Think about Jon's list of improvements. (all)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20210429/f6e372dc/attachment.html>