[Halld-offline] Offline Software Meeting Minutes, September 20, 2017

Mark Ito marki at jlab.org
Wed Sep 20 20:38:55 EDT 2017


Folks,

Please find the minutes below and at 
https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_20,_2017#Minutes 
.

_______________________


    GlueX Offline Meeting, September 20, 2017, Minutes

Present:

  * *CMU: * Curtis Meyer
  * *FSU: *: Brad Cannon, Sean Dobbs
  * *Glasgow: *: Peter Pauli
  * *JLab: *: Alex Austregesilo, Thomas Britton, Eugene Chudakov,
    Sebastian Cole, Sergey Furletov, Mark Ito (chair), David Lawrence,
    Justin Stevens, Simon Taylor, Beni Zihlmann

There is a recording of this meeting <https://bluejeans.com/s/M4QaP/> on 
the BlueJeans site. Use your JLab credentials to access it.


      Announcements

 1. Automatic updates of ccdb.sqlite on OASIS
    <https://mailman.jlab.org/pipermail/halld-offline/2017-September/002938.html>.
    Mark started a system to keep the CCDB up-to-date for grid jobs.
    There are still a few authentication wrinkles to iron out of the system.
 2. HDvis progress
    <https://halldweb.jlab.org/talks/2017/HDvis2/js/event.html>. Thomas
    summarized recent progress. See the recording starting at 6:50 for
    the visuals.
      * Recent work has been on performance. Overall, a factor of 3-5 in
        speed has been realized.
      * All of the BCAL modules were rendered before, now just the hit
        ends are shown.
      * Only the hit bars of the TOF are shown.
      * The FCAL no longer appears cut in half when viewed from certain
        angles.
      * There is a play/pause button now.
      * In the future, there will be a time-selecting slider bar.


      Review of minutes from the last meeting

We went over the minutes from September 6 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_6,_2017#Minutes>; 
there was no significant discussion.


      Report from the JLab Computing Steering Committee Meeting

Chip Watson gave a summary of SciComp activities at the meeting on 
August 23. See his slides 
<https://halldweb.jlab.org/talks/2017/IT%20Steering%20Committee%20Sept%202017%20SciComp.pptx> 
for the details. Some highlights directly relevant for us:

  * A new work disk server is coming soon.
  * An NERSC <http://www.nersc.gov/> share has been granted to explore
    using that facility to augment computing capability beyond that of
    the JLab farm.

Sean asked if SciComp was going to provide tools to help use this new 
resource. Mark replied that the SciComp is willing to commit manpower to 
development of tools, but that has not started yet. Related to this 
David reported that his Lab-Directed Research and Development (LDRD) 
proposal has been approved. This effort to develop a next-generation 
JANA also involves bench-marking the new system at NERSC. He has been 
awarded a share, independent of the one that Chip announced. That effort 
will give Hall D some in-house experience running on this new (to us) 
platform.


      Computing Resources

Alex brought up two important issues.


        Analysis of REST Files

Alex reported that there are a lot of jobs on the farm these days from 
individual GlueX collaborators running over REST files. There are no 
production launches in progress, so at present they do not cause 
conflict, but there is potential for conflict in the future. We should 
encourage use of the clusters at collaboration institutions for these 
tasks, like is being done at CMU and IU.

Sean pointed out that the analysis launches have been the most efficient 
way to go through the REST data in a way that benefits the greatest 
number of individual analyses. After some discussion we decided to 
institute monthly analysis launches, more if there is a pressing demand, 
to go through the REST data. [Added in press: launches will be on the 
last Friday of every month. The next one will therefore be on Friday, 
September 29. Mark your calendar.]


        Cache Disk Space

Alex reported that the pin quota on our cache disk has been reduced 
recently by SciComp, down to 160 TB from 350 TB[?]. This has caused REST 
data to be deleted. The lack of spinning REST data has in turn slowed 
the progress of the individual-collaborator jobs, mentioned in the 
previous section, due to jobs having to wait on tape retrieval of input 
files.

Mark reminded us that in our original plans we had a rough guess of disk 
space needed for reconstructed data of 500 TB. The plan was to put the 
data on work where we can control when data goes on the disk and when it 
comes off. Now that we have the pinning mechanism, the cache disk seems 
a much more natural place since all of the data is on tape and the cache 
disk is a partial mirror of tape. And in fact we probably underestimated 
the amount needed. Intermediate term, 1 PB seems a more reasonable 
number. Although near-term, disk space is budget limited, we should 
think about upping our request.

Other points:

  * Alex told us that the REST data for Spring 17 alone is 120 TB.
    Spring 16 is not much smaller than that. Spring 17 REST nearly
    saturates the current pin quota which is already maxed out with
    analysis launch products.
  * When raw data is processed, those files are charged against our pin
    quota. We cannot unpin them when the jobs finish; that is under
    control of the farm system. At certain points in time this can be a
    large amount of data.
  * When unpinned data are deleted, it is done on the basis of
    modification time (time since the file was put on disk) and not
    access time (time since the file was last read). It would be more
    efficient if access time could be used. Frequently used files could
    hang around, eliminating the need to re-fetch them from tape.

This discussion highlighted the need for a more detailed analysis of 
disk requirements. We should review previous estimates and understand 
changes called for now that we have real experience. Justin pointed out 
that we are planning another reconstruction pass on the raw data in a 
few weeks (with corrected FCAL calibration, updated CDC dE/dx 
calculations, updated TOF timing constants, etc.) and we should look 
ahead to what that will require.


      Review of recent pull requests

We went over the the pull requests 
<https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr> 
since the last meeting.

Richard Jones checked in a change that simulates Cerenkov radiation in 
the FCAL light guides. David reminded us that he has seen this effect 
for particles that pass through the lead glass. He also noted that now 
that it is in the simulation, there needs to be an effort to incorporate 
it in mcsmear as an effective contribution to the "energy" for these 
particles.


      Review of recent discussion on the GlueX Software Help List

We looked at the list 
<https://groups.google.com/forum/#%21forum/gluex-software>. There was no 
significant discussion.


      Action Item Review

 1. Institute a monthly Analysis Launch. (Alex)
 2. Review and update the disk resource estimate. (Mark)
 3. Incorporate light-guide-induced Cerenkov radiation in mcsmear. (TBA)

Retrieved from 
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_September_20,_2017&oldid=83723"

  * This page was last modified on 20 September 2017, at 20:36.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170920/25e0aee1/attachment-0001.html>


More information about the Halld-offline mailing list