[Halld-offline] Software Meeting Minutes, July 7, 2020

Mark Ito marki at jlab.org
Tue Jul 7 19:28:10 EDT 2020


Friends,

Please find the minutes of today's meeting here 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_July_7,_2020#Minutes> 
and below.

   -- Mark


    GlueX Software Meeting, July 7, 2020, Minutes

Present: Alex Austregesilo, Thomas Britton, Sean Dobbs, Mark Ito 
(chair), Igal Jaegle, Richard Jones, Naomi Jarvis, David Lawrence, Keigo 
Mizutani, Susan Schadmand, Simon Taylor, Nilanga Wickramaarachchi, Beni 
Zihlmann

There is a recording of his meeting 
<https://bluejeans.com/s/MR_xjcUQ0sR/> on the BlueJeans site. Use your 
JLab credentials to authenticate.


      Announcements

 1. New version set with upgrade to Geant4: version_4.21.3.xml
    <https://mailman.jlab.org/pipermail/halld-offline/2020-June/004104.html>
    Ready for testers of an updated version of Geant4.
 2. multi-threaded hdgeant4 now working
    <https://mailman.jlab.org/pipermail/halld-offline/2020-July/004107.html>
    Richard implemented patches to Geant4 that fixed issues that
    prevented multi-threaded running from giving sensible results.
      * To address thread safety in the HDGeant4 code proper, he made a
        change to provide each thread with its own copy of the magnetic
        field map at the cost of 300 MB of memory per additional thread.
        This was done out of an abundance of caution; if not necessary
        he will back out the change to recover the memory used.
      * Richard proposed changing the CCDB to so that the magnetic field
        is no longer suppressed inside selected volumes, in particular
        in the BCAL. Other volumes that would get changed are the FCAL,
        TOF, DIRC, and CCAL where the effect was not manifest because
        the field is so much lower there. He discovered that the
        suppression resulted field discontinuities that caused crashes
        in particle propagation reported by Naomi. The suppression had
        originally been introduced to speed up shower development in
        GEANT, but Geant4 seems not to be affected by having the field
        on. We endorse the proposal.
 3. New release of Build Scripts: version 1.58
    <https://mailman.jlab.org/pipermail/halld-offline/2020-July/004108.html>
    Although this new release will accommodate future versions Geant4,
    it currently has a problem with our default version, 10.02.p02. Mark
    addressed this presently.
 4. New version set: version_4.22.0.xml
    <https://mailman.jlab.org/pipermail/halld-offline/2020-July/004109.html>.
    This version set incorporates the fix to multi-threaded running of
    HDGeant4 mentioned above.
 5. Checksum changes for files written to tape
    <https://halldweb.jlab.org/talks/2020/crc_md5.pdf> The Computer
    Center is dropping MD5 checksum and will rely on CRC32 sums for tape
    data validation.


      Review of Minutes from the Last Software Meeting

We went over the minutes from June 9 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Software_Meeting,_June_9,_2020#Minutes>. 



        New Release of JANA: version 0.8.2

David has addressed the request for suppressing of warnings when 
geometry paths are requested but not present. This is often a normal 
situation, e.g., when probing the geometry to see how the data should be 
analyzed. His fix is in a new release of JANA, version 0.8.2.


        Jana 2.0

Nathan Brei continues work on porting halld_recon to use JANA 2.0. We 
will get a report at a future meeting on this new major release.


      Report from Recent HDGeant4 Meetings

We went over the minutes from the June 16 meeting 
<https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_June_16,_2020#Minutes> 
and those from the June 30 meeting 
<https://halldweb.jlab.org/wiki/index.php/HDGeant4_Meeting,_June_30,_2020#Minutes> 
without much comment.


      Report from the SciComp Meeting

Please see Mark's slides 
<https://docs.google.com/presentation/d/1bmy-IYamSmEtv4HsOeXHgceD66vVPstLZXymLNf4tE0/edit?usp=sharing> 
for Scientific Computing news from the Computer Center.


      NERSC status

David brought us up-to-date on processing at NERSC. Please see his 
slides 
<https://docs.google.com/presentation/d/1BGYUvKztfzSGDGwcfTrs6DoS_EUUvaeTHReeYFAEKvc/edit?usp=sharing> 
for all of the details. Some broad points:

  * We have moved to two-hour jobs rather than the 6 to 8 hour jobs run
    in past campaigns. This gives us much better access to the backfill
    mechanism at NERSC.
  * The move involved significant development to our workflow to (a)
    process only selected parts of our 20 GB raw data files in a single
    job and (b) recombine the resulting output files to correspond to
    the original 20 GB file.
  * One significant challenge has been to run monitoring launches, with
    their 57 plugins, to give complete ROOT output files for each
    selection of the raw data file.
  * David pointed out that the current version of the Oasis image of our
    software does not support development (i.e., building new versions
    of software), only running. He has put up a Docker container that
    remedies this. Richard has also run into this issue; he has put the
    missing pieces in an undisclosed location on Oasis.
      o Mark remarked that having a developer-friendly version of Oasis
        would involve very little work, only real estate on Oasis. He
        will look into this.
  * Alex suggested that one way forward is to abandon processing of
    monitoring launches at NERSC and concentrate on REST file production
    which uses a smaller set of plugins, plugins that have had better
    past records of success.


      OSG Jobs and mcsmear

Today, Thomas noticed that many jobs were crashing in mcsmear when 
accessing the SQLite form of the CCDB on the OSG. Two issues:

 1. What is wrong with the SQLite file?
 2. Why is it that mcsmear exits with status code = 0 (i.e., success)
    after bombing?

We were only able to address the first issue. David noticed that today's 
SQLite file was a bit smaller than usual, making it suspect. Mark 
promised to recreate the SQLite file and ship it out via Oasis. [Added 
in press: (a) Mark made good on his promise 
<https://mailman.jlab.org/pipermail/halld-offline/2020-July/004115.html> 
and regenerated the SQLite file. (b) David had reported this issue via 
email to Mark early this morning. Suffice it to say that Mark is behind 
on his email.]

Retrieved from 
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Software_Meeting,_July_7,_2020&oldid=100104"

  * This page was last modified on 7 July 2020, at 19:25.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20200707/24177efd/attachment-0001.html>


More information about the Halld-offline mailing list