[Halld-offline] Offline Software Meeting Minutes, September 20, 2017
Mark Ito
marki at jlab.org
Wed Sep 20 20:38:55 EDT 2017
Folks,
Please find the minutes below and at
https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_20,_2017#Minutes
.
_______________________
GlueX Offline Meeting, September 20, 2017, Minutes
Present:
* *CMU: * Curtis Meyer
* *FSU: *: Brad Cannon, Sean Dobbs
* *Glasgow: *: Peter Pauli
* *JLab: *: Alex Austregesilo, Thomas Britton, Eugene Chudakov,
Sebastian Cole, Sergey Furletov, Mark Ito (chair), David Lawrence,
Justin Stevens, Simon Taylor, Beni Zihlmann
There is a recording of this meeting <https://bluejeans.com/s/M4QaP/> on
the BlueJeans site. Use your JLab credentials to access it.
Announcements
1. Automatic updates of ccdb.sqlite on OASIS
<https://mailman.jlab.org/pipermail/halld-offline/2017-September/002938.html>.
Mark started a system to keep the CCDB up-to-date for grid jobs.
There are still a few authentication wrinkles to iron out of the system.
2. HDvis progress
<https://halldweb.jlab.org/talks/2017/HDvis2/js/event.html>. Thomas
summarized recent progress. See the recording starting at 6:50 for
the visuals.
* Recent work has been on performance. Overall, a factor of 3-5 in
speed has been realized.
* All of the BCAL modules were rendered before, now just the hit
ends are shown.
* Only the hit bars of the TOF are shown.
* The FCAL no longer appears cut in half when viewed from certain
angles.
* There is a play/pause button now.
* In the future, there will be a time-selecting slider bar.
Review of minutes from the last meeting
We went over the minutes from September 6
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_6,_2017#Minutes>;
there was no significant discussion.
Report from the JLab Computing Steering Committee Meeting
Chip Watson gave a summary of SciComp activities at the meeting on
August 23. See his slides
<https://halldweb.jlab.org/talks/2017/IT%20Steering%20Committee%20Sept%202017%20SciComp.pptx>
for the details. Some highlights directly relevant for us:
* A new work disk server is coming soon.
* An NERSC <http://www.nersc.gov/> share has been granted to explore
using that facility to augment computing capability beyond that of
the JLab farm.
Sean asked if SciComp was going to provide tools to help use this new
resource. Mark replied that the SciComp is willing to commit manpower to
development of tools, but that has not started yet. Related to this
David reported that his Lab-Directed Research and Development (LDRD)
proposal has been approved. This effort to develop a next-generation
JANA also involves bench-marking the new system at NERSC. He has been
awarded a share, independent of the one that Chip announced. That effort
will give Hall D some in-house experience running on this new (to us)
platform.
Computing Resources
Alex brought up two important issues.
Analysis of REST Files
Alex reported that there are a lot of jobs on the farm these days from
individual GlueX collaborators running over REST files. There are no
production launches in progress, so at present they do not cause
conflict, but there is potential for conflict in the future. We should
encourage use of the clusters at collaboration institutions for these
tasks, like is being done at CMU and IU.
Sean pointed out that the analysis launches have been the most efficient
way to go through the REST data in a way that benefits the greatest
number of individual analyses. After some discussion we decided to
institute monthly analysis launches, more if there is a pressing demand,
to go through the REST data. [Added in press: launches will be on the
last Friday of every month. The next one will therefore be on Friday,
September 29. Mark your calendar.]
Cache Disk Space
Alex reported that the pin quota on our cache disk has been reduced
recently by SciComp, down to 160 TB from 350 TB[?]. This has caused REST
data to be deleted. The lack of spinning REST data has in turn slowed
the progress of the individual-collaborator jobs, mentioned in the
previous section, due to jobs having to wait on tape retrieval of input
files.
Mark reminded us that in our original plans we had a rough guess of disk
space needed for reconstructed data of 500 TB. The plan was to put the
data on work where we can control when data goes on the disk and when it
comes off. Now that we have the pinning mechanism, the cache disk seems
a much more natural place since all of the data is on tape and the cache
disk is a partial mirror of tape. And in fact we probably underestimated
the amount needed. Intermediate term, 1 PB seems a more reasonable
number. Although near-term, disk space is budget limited, we should
think about upping our request.
Other points:
* Alex told us that the REST data for Spring 17 alone is 120 TB.
Spring 16 is not much smaller than that. Spring 17 REST nearly
saturates the current pin quota which is already maxed out with
analysis launch products.
* When raw data is processed, those files are charged against our pin
quota. We cannot unpin them when the jobs finish; that is under
control of the farm system. At certain points in time this can be a
large amount of data.
* When unpinned data are deleted, it is done on the basis of
modification time (time since the file was put on disk) and not
access time (time since the file was last read). It would be more
efficient if access time could be used. Frequently used files could
hang around, eliminating the need to re-fetch them from tape.
This discussion highlighted the need for a more detailed analysis of
disk requirements. We should review previous estimates and understand
changes called for now that we have real experience. Justin pointed out
that we are planning another reconstruction pass on the raw data in a
few weeks (with corrected FCAL calibration, updated CDC dE/dx
calculations, updated TOF timing constants, etc.) and we should look
ahead to what that will require.
Review of recent pull requests
We went over the the pull requests
<https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr>
since the last meeting.
Richard Jones checked in a change that simulates Cerenkov radiation in
the FCAL light guides. David reminded us that he has seen this effect
for particles that pass through the lead glass. He also noted that now
that it is in the simulation, there needs to be an effort to incorporate
it in mcsmear as an effective contribution to the "energy" for these
particles.
Review of recent discussion on the GlueX Software Help List
We looked at the list
<https://groups.google.com/forum/#%21forum/gluex-software>. There was no
significant discussion.
Action Item Review
1. Institute a monthly Analysis Launch. (Alex)
2. Review and update the disk resource estimate. (Mark)
3. Incorporate light-guide-induced Cerenkov radiation in mcsmear. (TBA)
Retrieved from
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_September_20,_2017&oldid=83723"
* This page was last modified on 20 September 2017, at 20:36.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170920/25e0aee1/attachment-0001.html>
More information about the Halld-offline
mailing list