[Halld-offline] Offline Software Meeting Minutes, November 1, 2017

Mark Ito marki at jlab.org
Fri Nov 3 13:50:14 EDT 2017


Folks,

Please find the minutes below and at

https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_November_1,_2017#Minutes

   - Mark

_________________________


    Minutes, GlueX Offline Meeting, November 1, 2017


Present:

  * *CMU: *: Curtis Meyer
  * *FSU *: Sean Dobbs
  * *JLab: *: Alex Austregesilo, Thomas Britton, Brad Cannon, Eugene
    Chudakov, Sebastian Cole, Mark Ito (chair), Simon Taylor, Beni Zihlmann

There is a recording of this meeting <https://bluejeans.com/s/L5K9O/> on 
the BlueJeans site. Use your JLab credentials to access it.


      Announcements

 1. Change to cache deletion policy
    <https://mailman.jlab.org/pipermail/halld-offline/2017-October/002970.html>
    Files that were requested from tape by farm jobs are now first in
    line for deletion once the requesting job has finished. This extends
    the life of files that users create or request directly on the cache
    disk.
 2. sim-recon 2.18.0
    <https://github.com/JeffersonLab/sim-recon/releases/tag/2.18.0> This
    release went out on October 10. It has the extensive changes to the
    Analysis Library from Paul Mattione.
 3. MCwrapper v1.9
    <https://mailman.jlab.org/pipermail/halld-offline/2017-October/002984.html>
    Thomas has added a facility to automatically generate an event
    sample, in a user-specified run range, with runs populated in
    proportion to the number of events in the real data.
 4. *Launches*. Alex reported that the most recent analysis launch has
    been inadvertently running with a job-count restriction left over
    from an incident with an analysis launch a few weeks back where
    SciComp had to throttle the number of jobs. The restriction was
    lifted this morning.


      Review of minutes from the last meeting

We went over the minutes from October 4 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_4,_2017#Minutes>. 



        OASIS file system

Mark discussed the cron job update problem with Richard Jones. Another 
way has been identified; no more problem.


        HDvis Update

Thomas reported a but in the JANA server (which sends events up to the 
web browser for display). It crashes on RHEL7 and CentOS7 on a language 
feature not supported by the gcc 4.8.5. compiler found on those 
platforms. Turns out to work on RHEL6 and CentOS6 (gcc 4.9.2). Several 
avenues for a fix are being pursued, in consultation with Dmitry Romanov.


        Work Disk Clean-Up

Mark reported that in he latest twist, we will not need to reduce our 
usage on /wok to a level below 45 TB before moving the the new, 
non-Lustre file server. Our current 55 TB will fit since not all of the 
Halls will be moving at once.

For the final cut-over from Lustre to non-Lustre, we will have to stop 
writing yo /work for a short period of time. This may (or may not) 
present a problem if we have a large launch underway. This issue needs 
further discussion, but should not be a big problem.


      Report from the SciComp-Physics meeting

  * Chip Watson suggested that we try to stress the data stream
    bandwidth from the Counting House to the Computer Center before the
    December run, ideally in concert with a similar test by Hall B.
  * ifarm1101 and the CentOS6 farm nodes will not be upgraded from
    CentOS 6.5 to 6.9. It is not worth the effort for a handful of nodes.
  * Physics will initiate another procurement of Lustre-based disk, 200
    TB worth, to augment our volatile and work space. There is the
    possibility for more if we can justify it.


      Computing Milestones

Mark showed us an email from Amber Boehnlein 
<https://halldweb.jlab.org/talks/2017/milestone_planning.pdf> from back 
in August, proposing an effort to develop "milestones" for Scientific 
Computing in the 12 GeV era, as an aide to Lab management for gauging 
progress. Work on this has languished, and needs to be pick up again.


      Review of recent pull requests

We went over the list of open and closed requests list of open and 
closed requests 
<https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr>.

Pull request #947 <https://github.com/JeffersonLab/sim-recon/pull/947>, 
"Updated FCAL geometry to new block size" has implications for detector 
calibrations. It comes in concert with updates to the FCAL geometry from 
Richard (HDDS pull requests #38 
<https://github.com/JeffersonLab/hdds/pull/38> and 
[https://github.com/JeffersonLab/hdds/pull/42 #42). So far, the observed 
effects on the calibrations have been slight. Sean is monitoring the 
situation to see where recalibrations may be necessary.


      Review of recent discussion on the GlueX Software Help List

We went over recent items 
<https://groups.google.com/forum/#%21forum/gluex-software> with no 
significant discussion.


      CPU Usage Projections

Beni asked about projections for our demands on the farm during the 
upcoming run.

  * Sean thought that now that firmware problems have been understood,
    calibrations should go faster.
  * Alex expressed doubt about whether we can support a full
    reconstruction launch during the run with its attendant monitoring
    jobs. Mark pointed out that priorities can be adjusted betwwen
    accounts to give time-critical jobs priority.
  * Mark ponted out that we need to review our CPU needs in light of our
    newly acquired experience, much like we are doing for disk usage
    <https://docs.google.com/spreadsheets/d/1QQQ2R3QrJkJgN37lt9yRlhVylk4KDJQKAn8m6Is8paM/edit?usp=sharing>.

-- 
Mark Ito, marki at jlab.org, (757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20171103/19ca418e/attachment.html>


More information about the Halld-offline mailing list