<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Folks,</p>

    <p>Please find the minutes <a moz-do-not-send="true"

href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_30,_2021#Minutes">here</a>

      and below.</p>

    <p>  -- Mark</p>

    <p>     _______________________________________________</p>

    <p>

    </p>

    <div id="globalWrapper">

      <div id="column-content">

        <div id="content" class="mw-body" role="main">

          <h2 id="firstHeading" class="firstHeading" lang="en"><span

              dir="auto">GlueX Software Meeting, March 30, 2021, </span><span

              class="mw-headline" id="Minutes">Minutes</span></h2>

          <div id="bodyContent" class="mw-body-content">

            <div id="mw-content-text" dir="ltr" class="mw-content-ltr"

              lang="en">

              <p>Present: Alexander Austregesilo, Thomas Britton, Sean

                Dobbs, Mark Ito (chair), Igal Jaegle, David Lawrence,

                Justin Stevens, Simon Taylor, Nilanga Wickramaarachchi,

                Beni Zihlmann

              </p>

              <p>There is a <a rel="nofollow" class="external text"

                  href="https://bluejeans.com/s/ZrnFl4s1d9M/">recording

                  of this meeting</a>. Log into the <a rel="nofollow"

                  class="external text"

                  href="https://jlab.bluejeans.com">BlueJeans site</a>

                first to gain access (use your JLab credentials).

              </p>

              <h3><span class="mw-headline" id="Announcements">Announcements</span></h3>

              <ol>

                <li> <a rel="nofollow" class="external text"

href="https://halldweb.jlab.org/wiki-private/index.php/SciComp_Issue_Tracking">SciComp

                    Issue Tracking</a>. Sean has put up a wiki page to

                  collect problem reports regarding the farm, ifarm, and

                  other SciComp resources at JLab.

                  <ul>

                    <li> Alex reported on a problem that has been fixed:

                      SWIF jobs have recently been failing at a high

                      rate (10-20% of jobs). The cause was traced back

                      to slow database access. Jobs that had finished

                      properly were timing out while trying to transmit

                      their state to the database and thus were getting

                      marked as "SWIF system error". Chris Larrieu

                      discovered a way to increase the speed of the

                      database queries by a factor of twelve,

                      eliminating the time-outs and greatly improving

                      the success rate. Alex will follow up with Chris

                      and get more information on the miracle cure.</li>

                  </ul>

                </li>

                <li> <a rel="nofollow" class="external text"

href="https://mailman.jlab.org/pipermail/halld-offline/2021-March/008497.html">New

                    version set: version_4.37.0.xml</a> The new version

                  set came out Sunday. Note that HDGeant4 now has the

                  fix to the calculation of DOCA in the FDC.</li>

              </ol>

              <h3><span class="mw-headline"

                  id="Review_of_Minutes_from_the_Last_Software_Meeting">Review

                  of Minutes from the Last Software Meeting</span></h3>

              <p>We went over the <a

href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_March_16,_2021#Minutes"

                  title="GlueX Software Meeting, March 16, 2021">minutes

                  from the meeting on March 16</a>. Mark pointed out

                that Alex highlighted his exploitation of AmpTools

                ability to use GPUs in his talk at the recent Exotic

                Search Review. Thomas reported that the purchase order

                for the new farm nodes, including those with GPUs

                on-board, has been signed.

              </p>

              <h3><span class="mw-headline"

                  id="Minutes_from_the_Last_HDGeant4_Meeting">Minutes

                  from the Last HDGeant4 Meeting</span></h3>

              <p>We went over the <a

href="https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_March_23,_2021#Minutes"

                  title="HDGeant4 Meeting, March 23, 2021">minutes from

                  the meeting on March 23</a>. There was some sentiment

                for closing the issue of drift distances in the FDC, but

                we did not reach a firm decision about where to continue

                discussion of agreement with data.

              </p>

              <h3><span class="mw-headline" id="OSG_Issues">OSG Issues</span></h3>

              <p>Thomas reported on a recent fix to a problem that has

                been with us for over a year with job submission to the

                OSG. There had been a cap imposed on the number of idle

                jobs at 1,000 when job execution ground to a halt back

                then due to a large number of such jobs. The root

                problem was not understood. Recently the cap was listed

                and for a period three weeks ago the number of CLAS12

                jobs in the idle state ballooned to 70,000. All

                monitoring and job progress stopped and responses to

                queries to Condor would not return. Thomas traced the

                problem to the jobs writing their Condor logs to

                volatile, generating many small disk accesses to Lustre

                and rendering the system unresponsive. All jobs going

                from the JLab submit host (scosg16.jlab.org) were

                affected. The solution was to dismount all Lustre disk

                systems from scosg16.

              </p>

              <p>Things are fine now, but there is a backlog of GlueX

                jobs that are still working their way through the

                system.

              </p>

              <p>Thomas also mentioned two areas for controlling OSG

                jobs and their relative priority.

              </p>

              <ol>

                <li> There is a priority mechanism built in MCwrapper.

                  Requests for adjustment should be directed to Thomas.</li>

                <li> After discussions with Justin and others, Thomas is

                  instituting a upper limit of 250 million events per

                  MCwrapper project. This will prevent inadvertent

                  submissions from swamping the system and allow for the

                  work-around of submitting more than one project if

                  more events are needed.</li>

              </ol>

              <h3><span class="mw-headline"

                  id="Software_Testing_Discussion">Software Testing

                  Discussion</span></h3>

              <p>Mark reminded us where some of the existing

                documentation on our test procedures resides, <a

href="https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Software#Testing_and_Debugging"

                  title="GlueX Offline Software">linked from the Offline

                  Software wiki page</a>. He also went the through the

                list of items we discussed at the <a

href="https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_February_2,_2021#Standardized_Tests"

                  title="GlueX Software Meeting, February 2, 2021">Software

                  Meeting on February 2</a>. We a bit of an inconclusive

                discussion on how to make progress on the quality and

                quantity of our testing regime. Mark suggested a meeting

                of a small group of us to frame the issue. Justin

                cautioned us there there was a lot on our plate with the

                upcoming APS Meeting and that a portion of a Software

                Meeting after the Meeting might be a good place to come

                up with a plan.

              </p>

              <h3><span class="mw-headline"

                  id="Review_of_recent_issues_and_pull_requests">Review

                  of recent issues and pull requests</span></h3>

              <p>Mark called out attention to <a rel="nofollow"

                  class="external text"

                  href="https://github.com/JeffersonLab/halld_sim/issues/190">halld_sim

                  Issue #190</a>, Run-to-run efficiency variation in

                2018 run periods", will be discussed at tomorrow's

                Production and Analysis meeting, so we did not discuss

                it directly.

              </p>

              <p>On a related point Justin mentioned that there were

                several recent changes that should get done before new

                Monte Carlo is produced.

              </p>

              <ol>

                <li> Tagger energy assignment improvement.

                  <ul>

                    <li> Reconstruction-launch-compatible halld_recon

                      versions need patches to apply the new scheme.</li>

                    <li> REST data sets from previous launches need a

                      new reader to undo the old scheme and apply the

                      new one.</li>

                  </ul>

                </li>

                <li> Fix to FDC efficiency in Spring 2018

                  <ul>

                    <li> This is the issue mentioned above.</li>

                  </ul>

                </li>

                <li> additional random trigger file skim

                  <ul>

                    <li> Sean reported that there will be an effort to

                      fill in some of the gaps in our coverage of runs

                      with corresponding random triggers.</li>

                  </ul>

                </li>

              </ol>

              <p>Mark mentioned two of his favorite recent outstanding

                pull requests:

              </p>

              <ul>

                <li> Diracxx: <a rel="nofollow" class="external text"

                    href="https://github.com/JeffersonLab/Diracxx/pull/2">Introduce

                    two make variables: #2</a>. This will allow Diracxx

                  to build on more advanced distributions like CentOS 8

                  and Ubuntu 20.</li>

                <li> gluex_root_analysis: <a rel="nofollow"

                    class="external text"

                    href="https://github.com/JeffersonLab/gluex_root_analysis/pull/147">Top

                    level make mmi #147</a>. This is a reworking of the

                  build system to use a makefile at all levels rather

                  than a mixture of makefiles and shell scripts. The old

                  mixed system would not halt when errors were generated

                  in the build. Also dependence of the build on

                  ROOT_ANALYSIS_HOME was removed, making it easier to

                  build a local version of the package. [Added in press:

                  Alex tested and merged the pull request soon after the

                  meeting.]</li>

              </ul>

              <h3><span class="mw-headline" id="The_Work_Disk_is_Full">The

                  Work Disk is Full</span></h3>

              <p>Simon broke the news to us. The proximate cause is the

                mysterious fluctuation of our quota on the disk server.

                See the red points in the plot below:

              </p>

              <p><a

href="https://halldweb.jlab.org/wiki/index.php/File:Work_disk_2021-03-30.png"

                  class="image"><img alt="Work disk 2021-03-30.png"

src="https://halldweb.jlab.org/wiki/images/thumb/a/a8/Work_disk_2021-03-30.png/500px-Work_disk_2021-03-30.png"

                    width="500" height="352"></a>

              </p>

              <h3><span class="mw-headline" id="Action_Item_Review">Action

                  Item Review</span></h3>

              <ol>

                <li> Make sure that the automatic tests of HDGeant4 pull

                  requests have been fully implemented. (Mark I., Sean)</li>

                <li> Finish conversion of halld_recon to use JANA2.

                  (Nathan)</li>

                <li> Release CCDB 2.0 (Dmitry, Mark I.)</li>

              </ol>

            </div>

            <br>

          </div>

        </div>

      </div>

    </div>

  </body>

</html>