[Halld-offline] Offline Software Meeting Minutes, February 9. 2018
marki at jlab.org
Sat Feb 10 15:04:36 EST 2018
Please find the minutes below and at
GlueX Offline Meeting, February 9, 2018, Minutes
* *CMU: * Curtis Meyer
* *FSU: * Sean Dobbs
* *JLab: * Alex Austregesilo, Thomas Britton, Hovanes Egiyan, Mark Ito
(chair), David Lawrence, Beni Zihlmann
* *Yerevan: * Hrach Marukyan
There is a recording of this meeting <https://bluejeans.com/s/oJ0JA/> on
the BlueJeans site. Use your JLab credentials to access it.
1. Reclamation of halld-scratch volume set
19 of our tapes are about to be erased.
2. *New top-level directory: /mss/halld/detectors*. A new tape
directory for data related to specific detectors, e. g.,
3. New sim-recon release: version 2.23.0
This release includes Simon's newly tuned parameters for track
matching to calorimeter clusters.
4. *New simple email list: online_calibrations*. Sean has created a
Simple Email List
for those interested in keeping track of the latest calibration results.
5. *Launches*. Thomas caught us up.
* There is a monitoring launch that will start soon.
* The reconstruction launch with recent software on Spring 2016
data has started.
* Several people are looking at the anomaly that Beni has
identified, visible in some TOF monitoring plots (though the TOF
Review of minutes from the January 26 meeting
We went over the minutes
In the context of discussing the recent changes with using the RCDB in
the offline software, we started a more general discussion of whether
certain packages, and RCDB in particular should be optional. David
explained that there are certain packages (among them RCDB, the ET
system) that have been maintained as optional. The mechanism:
1. If the "home" environment variable (e. g., RCDB_HOME) is not
defined, then at build time, the build system (i. e., SCons)
does not include the corresponding include paths and libraries
in the build commands. No dependence on the package is built
into the resulting code. A warning that this is happening is
printed during the build. This does not halt the build.
2. If any program requires the package in question, then it is
built so that when the program is run, it exits immediately with
an error informing the user that the environment variables for
the missing package needs to be defined and that package built.
Then the build of the program can be redone.
o To implement this behavior appropriate C-preprocessor
directives need to be included in the source code so that
the resulting program has the appropriate behavior depending
on the setting or (non-setting) of the home environment
There was a lot of discussion on whether this is a useful feature that
should be maintained, mainly between two of us. An incomplete summary:
* *David *. Having the ability to exclude packages whose functionality
is completely irrelevant to the software builder is very convenient:
extraneous packages need not be built and the resulting disk
footprint of the code is reduced.
* *Mark *. Having this flexibility requires coding and creates trap
doors that code software builders can fall into. If a needed
environment variable is missing, then the warning at build time can
easily be missed and the error at run time may leave the user (who
may not be the builder) at a loss as to how to proceed.
Mark will put the issue on the agenda of a future meeting.
Release Management Thoughts
In the course of discussing Sean's presentation from last time, Thomas
brought up the idea of breaking up the sim-recon repository into two
repositories. In a moment of inspiration he came up with a concept for
the split: one repository for simulation and one for reconstruction.
This would simplify the task of "release management" (as defined last
time). The technical advantage is that simulation code can be versioned
independently of the reconstruction code. Right now, Sean has to
maintain a reconstruction-fixed-simulation-changing branch of sim-recon
to get the right behavior. If the two functions were versioned
separately, this recon-fixed-sim-changing property would be manifest.
We noted that the two sides (sim and recon) are functionally closely
tied together. The question is the degree of independence of the
development streams on the two sides. E. g. if drift time in tracking
chambers is improved in simulation, then does the tracking
reconstruction have to change? Likely not, the sim side can go ahead
independently. On the other hand, if there is reconstruction code that
does one thing for real data and another for simulation, and the
simulation is improved so that unequal treatment is no longer necessary,
then both sides have to change together. In the latter case it would be
easier if both sides were in the same repository.
Mark will put the issue on the agenda of a future meeting.
Not-the-TOF Anomaly in Monitoring Histograms
Alex noted that the problem first appeared several months ago when the
material maps were changed in the CCDB. Beni has reported that the most
recent version of the code does not exhibit the problem, so the current
mystery is how the problem could have possibly fixed itself.
GlueX + NERSC
David has succeeded in running GlueX reconstruction jobs on two of the
NERSC <http://www.nersc.gov/> supercomputers.
* Cori I: Haswell (comparable to the JLab farm)
* Cori II: Knight's Landing (KNL)
He analyzed two runs on both architectures. See the results on his slide
He notes that the KNL jobs run 2.4 times slower than the Haswell even
though they are using four times the number of threads, i. e., ten times
slower on a per thread basis.
Sean has put together a nice little session for us on the Collaboration
SciComp Meeting Report
Mark reported items from the Scientific Computing meeting held Thursday,
1. Change to fair share allocations...
The change was discussed at the meeting.
2. ENP consumption of disk space under /work
* The second shelf of traditional raid is up and running. Our work
disk quota has been increased to 110 TB from 66 TB.
* Hall B migration to the new work disk is underway. That should
free up 85 TB of cache space. Mark emphasized that Hall D needs
more cache space.
3. Hall B needs 40 GB per job for the Java virtual machine. This is
causing cores to go idle as nodes are running in a memory-limited mode.
4. '/New tape drives, playing with four new LT07 drives and four new
LT08 drives./ They are planning the upgrade path.
AMD benchmark results
Sean has purchased a box with new AMD EPYC processors and ran benchmarks
of hd_root on it. For comparison he ran the same tests on gluon119 which
has Intel Xeon processors. See his slide
for details and results. Scaling for the two systems is comparable as
the number of threads is increased, though the AMD processors come a
one-third the price (on the CPU package itself).
Meeting on Containers
Mark announced a meeting later in the day to discuss use of containers
(Docker, Singularity) in various computing contexts (NERSC, OSG, JLab
farm, personal laptops). There is a lot of ground to cover here; a
series of meetings is likely.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Halld-offline