[Halld-offline] Offline Software Meeting Minutes, November 1, 2017
Mark Ito
marki at jlab.org
Fri Nov 3 13:50:14 EDT 2017
Folks,
Please find the minutes below and at
https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_November_1,_2017#Minutes
- Mark
_________________________
Minutes, GlueX Offline Meeting, November 1, 2017
Present:
* *CMU: *: Curtis Meyer
* *FSU *: Sean Dobbs
* *JLab: *: Alex Austregesilo, Thomas Britton, Brad Cannon, Eugene
Chudakov, Sebastian Cole, Mark Ito (chair), Simon Taylor, Beni Zihlmann
There is a recording of this meeting <https://bluejeans.com/s/L5K9O/> on
the BlueJeans site. Use your JLab credentials to access it.
Announcements
1. Change to cache deletion policy
<https://mailman.jlab.org/pipermail/halld-offline/2017-October/002970.html>
Files that were requested from tape by farm jobs are now first in
line for deletion once the requesting job has finished. This extends
the life of files that users create or request directly on the cache
disk.
2. sim-recon 2.18.0
<https://github.com/JeffersonLab/sim-recon/releases/tag/2.18.0> This
release went out on October 10. It has the extensive changes to the
Analysis Library from Paul Mattione.
3. MCwrapper v1.9
<https://mailman.jlab.org/pipermail/halld-offline/2017-October/002984.html>
Thomas has added a facility to automatically generate an event
sample, in a user-specified run range, with runs populated in
proportion to the number of events in the real data.
4. *Launches*. Alex reported that the most recent analysis launch has
been inadvertently running with a job-count restriction left over
from an incident with an analysis launch a few weeks back where
SciComp had to throttle the number of jobs. The restriction was
lifted this morning.
Review of minutes from the last meeting
We went over the minutes from October 4
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_4,_2017#Minutes>.
OASIS file system
Mark discussed the cron job update problem with Richard Jones. Another
way has been identified; no more problem.
HDvis Update
Thomas reported a but in the JANA server (which sends events up to the
web browser for display). It crashes on RHEL7 and CentOS7 on a language
feature not supported by the gcc 4.8.5. compiler found on those
platforms. Turns out to work on RHEL6 and CentOS6 (gcc 4.9.2). Several
avenues for a fix are being pursued, in consultation with Dmitry Romanov.
Work Disk Clean-Up
Mark reported that in he latest twist, we will not need to reduce our
usage on /wok to a level below 45 TB before moving the the new,
non-Lustre file server. Our current 55 TB will fit since not all of the
Halls will be moving at once.
For the final cut-over from Lustre to non-Lustre, we will have to stop
writing yo /work for a short period of time. This may (or may not)
present a problem if we have a large launch underway. This issue needs
further discussion, but should not be a big problem.
Report from the SciComp-Physics meeting
* Chip Watson suggested that we try to stress the data stream
bandwidth from the Counting House to the Computer Center before the
December run, ideally in concert with a similar test by Hall B.
* ifarm1101 and the CentOS6 farm nodes will not be upgraded from
CentOS 6.5 to 6.9. It is not worth the effort for a handful of nodes.
* Physics will initiate another procurement of Lustre-based disk, 200
TB worth, to augment our volatile and work space. There is the
possibility for more if we can justify it.
Computing Milestones
Mark showed us an email from Amber Boehnlein
<https://halldweb.jlab.org/talks/2017/milestone_planning.pdf> from back
in August, proposing an effort to develop "milestones" for Scientific
Computing in the 12 GeV era, as an aide to Lab management for gauging
progress. Work on this has languished, and needs to be pick up again.
Review of recent pull requests
We went over the list of open and closed requests list of open and
closed requests
<https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr>.
Pull request #947 <https://github.com/JeffersonLab/sim-recon/pull/947>,
"Updated FCAL geometry to new block size" has implications for detector
calibrations. It comes in concert with updates to the FCAL geometry from
Richard (HDDS pull requests #38
<https://github.com/JeffersonLab/hdds/pull/38> and
[https://github.com/JeffersonLab/hdds/pull/42 #42). So far, the observed
effects on the calibrations have been slight. Sean is monitoring the
situation to see where recalibrations may be necessary.
Review of recent discussion on the GlueX Software Help List
We went over recent items
<https://groups.google.com/forum/#%21forum/gluex-software> with no
significant discussion.
CPU Usage Projections
Beni asked about projections for our demands on the farm during the
upcoming run.
* Sean thought that now that firmware problems have been understood,
calibrations should go faster.
* Alex expressed doubt about whether we can support a full
reconstruction launch during the run with its attendant monitoring
jobs. Mark pointed out that priorities can be adjusted betwwen
accounts to give time-critical jobs priority.
* Mark ponted out that we need to review our CPU needs in light of our
newly acquired experience, much like we are doing for disk usage
<https://docs.google.com/spreadsheets/d/1QQQ2R3QrJkJgN37lt9yRlhVylk4KDJQKAn8m6Is8paM/edit?usp=sharing>.
--
Mark Ito, marki at jlab.org, (757)269-5295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20171103/19ca418e/attachment.html>
More information about the Halld-offline
mailing list