[Halld-offline] Offline Software Meeting Minutes, February 9. 2018

Mark Ito marki at jlab.org
Sat Feb 10 15:04:36 EST 2018


Gluons,

Please find the minutes below and at

https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_February_9,_2018 
.

   -- Mark

_________________


    GlueX Offline Meeting, February 9, 2018, Minutes

Present:

  * *CMU: * Curtis Meyer
  * *FSU: * Sean Dobbs
  * *JLab: * Alex Austregesilo, Thomas Britton, Hovanes Egiyan, Mark Ito
    (chair), David Lawrence, Beni Zihlmann
  * *Yerevan: * Hrach Marukyan

There is a recording of this meeting <https://bluejeans.com/s/oJ0JA/> on 
the BlueJeans site. Use your JLab credentials to access it.


      Announcements

 1. Reclamation of halld-scratch volume set
    <https://mailman.jlab.org/pipermail/halld-offline/2018-February/003101.html>.
    19 of our tapes are about to be erased.
 2. *New top-level directory: /mss/halld/detectors*. A new tape
    directory for data related to specific detectors, e. g.,
    /mss/halld/detectors/DIRC.
 3. New sim-recon release: version 2.23.0
    <https://mailman.jlab.org/pipermail/halld-offline/2018-February/003106.html>.
    This release includes Simon's newly tuned parameters for track
    matching to calorimeter clusters.
 4. *New simple email list: online_calibrations*. Sean has created a
    Simple Email List
    <https://halldweb.jlab.org/wiki/index.php/Email_Lists#Simple_Email_Lists>
    for those interested in keeping track of the latest calibration results.
 5. *Launches*. Thomas caught us up.
      * There is a monitoring launch that will start soon.
      * The reconstruction launch with recent software on Spring 2016
        data has started.
      * Several people are looking at the anomaly that Beni has
        identified, visible in some TOF monitoring plots (though the TOF
        is blameless).


      Review of minutes from the January 26 meeting

We went over the minutes 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_January_26,_2018#Minutes>. 



        Optional Packages

In the context of discussing the recent changes with using the RCDB in 
the offline software, we started a more general discussion of whether 
certain packages, and RCDB in particular should be optional. David 
explained that there are certain packages (among them RCDB, the ET 
system) that have been maintained as optional. The mechanism:

  *
     1. If the "home" environment variable (e. g., RCDB_HOME) is not
        defined, then at build time, the build system (i. e., SCons)
        does not include the corresponding include paths and libraries
        in the build commands. No dependence on the package is built
        into the resulting code. A warning that this is happening is
        printed during the build. This does not halt the build.
     2. If any program requires the package in question, then it is
        built so that when the program is run, it exits immediately with
        an error informing the user that the environment variables for
        the missing package needs to be defined and that package built.
        Then the build of the program can be redone.
          o To implement this behavior appropriate C-preprocessor
            directives need to be included in the source code so that
            the resulting program has the appropriate behavior depending
            on the setting or (non-setting) of the home environment
            variable.

There was a lot of discussion on whether this is a useful feature that 
should be maintained, mainly between two of us. An incomplete summary:

  * *David *. Having the ability to exclude packages whose functionality
    is completely irrelevant to the software builder is very convenient:
    extraneous packages need not be built and the resulting disk
    footprint of the code is reduced.
  * *Mark *. Having this flexibility requires coding and creates trap
    doors that code software builders can fall into. If a needed
    environment variable is missing, then the warning at build time can
    easily be missed and the error at run time may leave the user (who
    may not be the builder) at a loss as to how to proceed.

Mark will put the issue on the agenda of a future meeting.


        Release Management Thoughts

In the course of discussing Sean's presentation from last time, Thomas 
brought up the idea of breaking up the sim-recon repository into two 
repositories. In a moment of inspiration he came up with a concept for 
the split: one repository for simulation and one for reconstruction. 
This would simplify the task of "release management" (as defined last 
time). The technical advantage is that simulation code can be versioned 
independently of the reconstruction code. Right now, Sean has to 
maintain a reconstruction-fixed-simulation-changing branch of sim-recon 
to get the right behavior. If the two functions were versioned 
separately, this recon-fixed-sim-changing property would be manifest.

We noted that the two sides (sim and recon) are functionally closely 
tied together. The question is the degree of independence of the 
development streams on the two sides. E. g. if drift time in tracking 
chambers is improved in simulation, then does the tracking 
reconstruction have to change? Likely not, the sim side can go ahead 
independently. On the other hand, if there is reconstruction code that 
does one thing for real data and another for simulation, and the 
simulation is improved so that unequal treatment is no longer necessary, 
then both sides have to change together. In the latter case it would be 
easier if both sides were in the same repository.

Mark will put the issue on the agenda of a future meeting.


        Not-the-TOF Anomaly in Monitoring Histograms

Alex noted that the problem first appeared several months ago when the 
material maps were changed in the CCDB. Beni has reported that the most 
recent version of the code does not exhibit the problem, so the current 
mystery is how the problem could have possibly fixed itself.


      GlueX + NERSC

David has succeeded in running GlueX reconstruction jobs on two of the 
NERSC <http://www.nersc.gov/> supercomputers.

  * Cori I: Haswell (comparable to the JLab farm)
  * Cori II: Knight's Landing (KNL)

He analyzed two runs on both architectures. See the results on his slide 
<https://docs.google.com/presentation/d/1FcOhSprdD-YPCZnagvaJl-oicK7y3zoSAuB-Nmo_vSc/edit?usp=sharing>. 
He notes that the KNL jobs run 2.4 times slower than the Haswell even 
though they are using four times the number of threads, i. e., ten times 
slower on a per thread basis.


      Collaboration Meeting

Sean has put together a nice little session for us on the Collaboration 
Meeting agenda 
<https://halldweb.jlab.org/wiki/index.php/GlueX-Collaboration-Feb-2018#Saturday_February_24.2C_2018>. 



      SciComp Meeting Report

Mark reported items from the Scientific Computing meeting held Thursday, 
February 1.

 1. Change to fair share allocations...
    <https://mailman.jlab.org/pipermail/halld-offline/2018-February/003093.html>.
    The change was discussed at the meeting.
 2. ENP consumption of disk space under /work
    <https://mailman.jlab.org/pipermail/halld-offline/2018-February/003092.html>.

      * The second shelf of traditional raid is up and running. Our work
        disk quota has been increased to 110 TB from 66 TB.
      * Hall B migration to the new work disk is underway. That should
        free up 85 TB of cache space. Mark emphasized that Hall D needs
        more cache space.
 3. Hall B needs 40 GB per job for the Java virtual machine. This is
    causing cores to go idle as nodes are running in a memory-limited mode.
 4. '/New tape drives, playing with four new LT07 drives and four new
    LT08 drives./ They are planning the upgrade path.


      AMD benchmark results

Sean has purchased a box with new AMD EPYC processors and ran benchmarks 
of hd_root on it. For comparison he ran the same tests on gluon119 which 
has Intel Xeon processors. See his slide 
<https://halldweb.jlab.org/wiki/images/f/fe/Sdobbs_Offline_020918.pdf> 
for details and results. Scaling for the two systems is comparable as 
the number of threads is increased, though the AMD processors come a 
one-third the price (on the CPU package itself).


      Meeting on Containers

Mark announced a meeting later in the day to discuss use of containers 
(Docker, Singularity) in various computing contexts (NERSC, OSG, JLab 
farm, personal laptops). There is a lot of ground to cover here; a 
series of meetings is likely.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20180210/1773ef31/attachment.html>


More information about the Halld-offline mailing list