<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Folks,</p>
    <p>Please find the minutes <a moz-do-not-send="true"
href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_30,_2021#Minutes">here</a>
      and below.</p>
    <p>  -- Mark</p>
    <p>     _______________________________________________</p>
    <p>
    </p>
    <div id="globalWrapper">
      <div id="column-content">
        <div id="content" class="mw-body" role="main">
          <h2 id="firstHeading" class="firstHeading" lang="en"><span
              dir="auto">GlueX Software Meeting, March 30, 2021, </span><span
              class="mw-headline" id="Minutes">Minutes</span></h2>
          <div id="bodyContent" class="mw-body-content">
            <div id="mw-content-text" dir="ltr" class="mw-content-ltr"
              lang="en">
              <p>Present: Alexander Austregesilo, Thomas Britton, Sean
                Dobbs, Mark Ito (chair), Igal Jaegle, David Lawrence,
                Justin Stevens, Simon Taylor, Nilanga Wickramaarachchi,
                Beni Zihlmann
              </p>
              <p>There is a <a rel="nofollow" class="external text"
                  href="https://bluejeans.com/s/ZrnFl4s1d9M/">recording
                  of this meeting</a>. Log into the <a rel="nofollow"
                  class="external text"
                  href="https://jlab.bluejeans.com">BlueJeans site</a>
                first to gain access (use your JLab credentials).
              </p>
              <h3><span class="mw-headline" id="Announcements">Announcements</span></h3>
              <ol>
                <li> <a rel="nofollow" class="external text"
href="https://halldweb.jlab.org/wiki-private/index.php/SciComp_Issue_Tracking">SciComp
                    Issue Tracking</a>. Sean has put up a wiki page to
                  collect problem reports regarding the farm, ifarm, and
                  other SciComp resources at JLab.
                  <ul>
                    <li> Alex reported on a problem that has been fixed:
                      SWIF jobs have recently been failing at a high
                      rate (10-20% of jobs). The cause was traced back
                      to slow database access. Jobs that had finished
                      properly were timing out while trying to transmit
                      their state to the database and thus were getting
                      marked as "SWIF system error". Chris Larrieu
                      discovered a way to increase the speed of the
                      database queries by a factor of twelve,
                      eliminating the time-outs and greatly improving
                      the success rate. Alex will follow up with Chris
                      and get more information on the miracle cure.</li>
                  </ul>
                </li>
                <li> <a rel="nofollow" class="external text"
href="https://mailman.jlab.org/pipermail/halld-offline/2021-March/008497.html">New
                    version set: version_4.37.0.xml</a> The new version
                  set came out Sunday. Note that HDGeant4 now has the
                  fix to the calculation of DOCA in the FDC.</li>
              </ol>
              <h3><span class="mw-headline"
                  id="Review_of_Minutes_from_the_Last_Software_Meeting">Review
                  of Minutes from the Last Software Meeting</span></h3>
              <p>We went over the <a
href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_16,_2021#Minutes"
                  title="GlueX Software Meeting, March 16, 2021">minutes
                  from the meeting on March 16</a>. Mark pointed out
                that Alex highlighted his exploitation of AmpTools
                ability to use GPUs in his talk at the recent Exotic
                Search Review. Thomas reported that the purchase order
                for the new farm nodes, including those with GPUs
                on-board, has been signed.
              </p>
              <h3><span class="mw-headline"
                  id="Minutes_from_the_Last_HDGeant4_Meeting">Minutes
                  from the Last HDGeant4 Meeting</span></h3>
              <p>We went over the <a
href="https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_March_23,_2021#Minutes"
                  title="HDGeant4 Meeting, March 23, 2021">minutes from
                  the meeting on March 23</a>. There was some sentiment
                for closing the issue of drift distances in the FDC, but
                we did not reach a firm decision about where to continue
                discussion of agreement with data.
              </p>
              <h3><span class="mw-headline" id="OSG_Issues">OSG Issues</span></h3>
              <p>Thomas reported on a recent fix to a problem that has
                been with us for over a year with job submission to the
                OSG. There had been a cap imposed on the number of idle
                jobs at 1,000 when job execution ground to a halt back
                then due to a large number of such jobs. The root
                problem was not understood. Recently the cap was listed
                and for a period three weeks ago the number of CLAS12
                jobs in the idle state ballooned to 70,000. All
                monitoring and job progress stopped and responses to
                queries to Condor would not return. Thomas traced the
                problem to the jobs writing their Condor logs to
                volatile, generating many small disk accesses to Lustre
                and rendering the system unresponsive. All jobs going
                from the JLab submit host (scosg16.jlab.org) were
                affected. The solution was to dismount all Lustre disk
                systems from scosg16.
              </p>
              <p>Things are fine now, but there is a backlog of GlueX
                jobs that are still working their way through the
                system.
              </p>
              <p>Thomas also mentioned two areas for controlling OSG
                jobs and their relative priority.
              </p>
              <ol>
                <li> There is a priority mechanism built in MCwrapper.
                  Requests for adjustment should be directed to Thomas.</li>
                <li> After discussions with Justin and others, Thomas is
                  instituting a upper limit of 250 million events per
                  MCwrapper project. This will prevent inadvertent
                  submissions from swamping the system and allow for the
                  work-around of submitting more than one project if
                  more events are needed.</li>
              </ol>
              <h3><span class="mw-headline"
                  id="Software_Testing_Discussion">Software Testing
                  Discussion</span></h3>
              <p>Mark reminded us where some of the existing
                documentation on our test procedures resides, <a
href="https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Software#Testing_and_Debugging"
                  title="GlueX Offline Software">linked from the Offline
                  Software wiki page</a>. He also went the through the
                list of items we discussed at the <a
href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_February_2,_2021#Standardized_Tests"
                  title="GlueX Software Meeting, February 2, 2021">Software
                  Meeting on February 2</a>. We a bit of an inconclusive
                discussion on how to make progress on the quality and
                quantity of our testing regime. Mark suggested a meeting
                of a small group of us to frame the issue. Justin
                cautioned us there there was a lot on our plate with the
                upcoming APS Meeting and that a portion of a Software
                Meeting after the Meeting might be a good place to come
                up with a plan.
              </p>
              <h3><span class="mw-headline"
                  id="Review_of_recent_issues_and_pull_requests">Review
                  of recent issues and pull requests</span></h3>
              <p>Mark called out attention to <a rel="nofollow"
                  class="external text"
                  href="https://github.com/JeffersonLab/halld_sim/issues/190">halld_sim
                  Issue #190</a>, Run-to-run efficiency variation in
                2018 run periods", will be discussed at tomorrow's
                Production and Analysis meeting, so we did not discuss
                it directly.
              </p>
              <p>On a related point Justin mentioned that there were
                several recent changes that should get done before new
                Monte Carlo is produced.
              </p>
              <ol>
                <li> Tagger energy assignment improvement.
                  <ul>
                    <li> Reconstruction-launch-compatible halld_recon
                      versions need patches to apply the new scheme.</li>
                    <li> REST data sets from previous launches need a
                      new reader to undo the old scheme and apply the
                      new one.</li>
                  </ul>
                </li>
                <li> Fix to FDC efficiency in Spring 2018
                  <ul>
                    <li> This is the issue mentioned above.</li>
                  </ul>
                </li>
                <li> additional random trigger file skim
                  <ul>
                    <li> Sean reported that there will be an effort to
                      fill in some of the gaps in our coverage of runs
                      with corresponding random triggers.</li>
                  </ul>
                </li>
              </ol>
              <p>Mark mentioned two of his favorite recent outstanding
                pull requests:
              </p>
              <ul>
                <li> Diracxx: <a rel="nofollow" class="external text"
                    href="https://github.com/JeffersonLab/Diracxx/pull/2">Introduce
                    two make variables: #2</a>. This will allow Diracxx
                  to build on more advanced distributions like CentOS 8
                  and Ubuntu 20.</li>
                <li> gluex_root_analysis: <a rel="nofollow"
                    class="external text"
                    href="https://github.com/JeffersonLab/gluex_root_analysis/pull/147">Top
                    level make mmi #147</a>. This is a reworking of the
                  build system to use a makefile at all levels rather
                  than a mixture of makefiles and shell scripts. The old
                  mixed system would not halt when errors were generated
                  in the build. Also dependence of the build on
                  ROOT_ANALYSIS_HOME was removed, making it easier to
                  build a local version of the package. [Added in press:
                  Alex tested and merged the pull request soon after the
                  meeting.]</li>
              </ul>
              <h3><span class="mw-headline" id="The_Work_Disk_is_Full">The
                  Work Disk is Full</span></h3>
              <p>Simon broke the news to us. The proximate cause is the
                mysterious fluctuation of our quota on the disk server.
                See the red points in the plot below:
              </p>
              <p><a
href="https://halldweb.jlab.org/wiki/index.php/File:Work_disk_2021-03-30.png"
                  class="image"><img alt="Work disk 2021-03-30.png"
src="https://halldweb.jlab.org/wiki/images/thumb/a/a8/Work_disk_2021-03-30.png/500px-Work_disk_2021-03-30.png"
                    width="500" height="352"></a>
              </p>
              <h3><span class="mw-headline" id="Action_Item_Review">Action
                  Item Review</span></h3>
              <ol>
                <li> Make sure that the automatic tests of HDGeant4 pull
                  requests have been fully implemented. (Mark I., Sean)</li>
                <li> Finish conversion of halld_recon to use JANA2.
                  (Nathan)</li>
                <li> Release CCDB 2.0 (Dmitry, Mark I.)</li>
              </ol>
            </div>
            <br>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>