[Halld-offline] Offline Software Meeting Minutes, January 21, 2015
marki at jlab.org
Fri Jan 23 18:58:05 EST 2015
Find the minutes below and at
GlueX Offline Meeting Minutes, January 21, 2015
* *CMU*: Curtis Meyer
* *FIU*: Mahmoud Kamel
* *FSU*: Aristeidis Tsaris
* *JLab*: Alex Barnes, Mark Ito (chair), David Lawrence, Paul
Mattione, Kei Moriya, Eric Pooser, Simon Taylor, Beni Zihlmann
* *NU*: Sean Dobbs
1. Our volatile disk was expanded recently. The reservation increased
from 10 to 20 TB, and the quota from 30 to 50 TB. We are using just
over 20 TB presently.
2. Marty Wise of Computing and Network Infrastructure (CNI) is working
on installing the Run Conditions Database (RCDB) on an Apache server.
3. CNI now has a desktop version of RedHat Enterprise Linux 7 available
for beta testers. See Kelvin Edwards for an install image.
4. Our work disk filled up this morning. We have 14 TB at present.
Volunteers deleting their files have got it down to 75% used now.
5. Mark remarked that we should review our long-term requests for disk
space and see if we can start to expand our disk portfolio in a
Review of Minutes from January 7
We went over the minutes
Items were either resolved or appear on the agenda for this meeting.
Data Challenge 3
Mark has successfully run test jobs going all the way from event
generation to REST file production. Along the way EVIO-formatted data is
produced and read. He showed some statistics
about the test jobs presented at the last Software Review Preparation
Meeting. The next step is to scale the jobs to real-challenge size. We
hope to be in production by the time of the Software Review.
Software Review Preparations
Curtis reviewed the discussion we had at last Friday's Software Review
We spent some time answering questions from Graham about out needs
vis-a-vis the schedule for computer procurements. We also ran down a
list of talking points and topics
that we plan to present.
Commissioning Run Review
Offline Monitoring Report
Kei gave the report.
* He ran over all files (online plugins, 2-track EVIO skim, REST) 2
* Next launch of the entire process is this Friday.
* The group will be testing EventStore to mark events. This will take
some dedicated disk space.
* Kei showed slides
giving an update on CentOS65 use and multi-thread processing.
Simon reported that he and Mark have started working on migration of
code developed on the commissioning branch during the run to the trunk
in the source code repository. An initial attempt have version that
compiled and ran, but when the b1pi test was run with the code, no
successful kinematic fits were produced.
Paul asked if the Monte Carlo variation was being used; it was not. This
will be tried next.
Analysis of REST File Data
Justin reported that he has had success reproducing his recent
bump-hunting plots starting from REST formatted data. This mode would
allow users to pursue similar studies without having to fetch the data
from tape and perform reconstruction; a big time savings. He did this
with a private version of the code. There is currently and issue with
unpacking tagger hits from the REST file. Hopefully this can be fixed
before the next generation of REST files are produced.
Handling Changing Magnetic Field Setting
Quoting from a recent email from Sean:
One of the bigger headaches of running over the fall data was keeping
track of all of the different magnetic field conditions, as the field
went up and down. It would be user-friendly if we could keep track of
this information as well, instead of forcing the user to specify the
correct magnetic field map on the command line every time. Naively, I'd
think that we could add a CCDB table that stored the name of the
magnetic field map to use, i.e., that same information that would be
passed in on the command line. Maybe this information is better stored
as geometry or something else, though?
Mark remarked that we did have a plan for handling this problem using
the CCDB and JANA Resources. David, Sean, and Mark will get together
offline to revisit the plan.
Quoting from the same email from Sean, three items:
Storing software information in REST files
Since we're storing information on the software conditions used for
reconstruction, it might be nice to store some of this information in
the "officially" created REST files themselves, for a certain amount of
Mark thought that it should be possible to add a "software version"
element to the rest format, independent of the physics events, at the
beginning of the file. Paul will ask Richard Jones about how this might
EVIO format definition for Level 3 trigger farm
Is running the L3 trigger farm a goal of the spring running? If so, it
would be useful to define the EVIO output format that would be used. I
seem to remember that even if we run in pass-through mode, the L3 farm
could be used to disentangle multi-block EVIO events, and output them in
David remarked that disentangling was fundamental in the L3 design and
any output format from L3, would be in single-blocked form.
EventStore: implementation plan
One thing that could save the amount of disk space needed for handling
skims would be the EventStore DB, the development of which I've taken
back up. However, the user would still need access to these files, so it
would only help for people running over the data at JLab. So in the end,
there might be still be a desire for us to make these files, for those
who want to analyze the files at their home institutions.
The exact model we will use has not been decided on. Mark thought that
to first order we would try to distribute at least the REST formatted
data to each institution and therefore each site could have a functional
EventStore-based system. This should be do-able for early running at least.
Requests to SciComp on Farm Features
Kei led us through a set of questions and feature requests he sent to
SciComp. These were collected from the group working on offline monitoring.
1. Tools to track jobs:
1. tools to track what percentage of nodes were being used by whom
at a given time, preferably in both # of jobs and threads.We can
see the pie charts for example in
http://scicomp.jlab.org/scicomp/#/auger/usage but would like the
information in a form that we can easily access and analyze.
2. what % of nodes are currently available for each OS at a given time
3. tools to track the life time of each stage of the job, such as
sitting in queue, waiting for files from tape, running, etc.
4. Would it be possible to make the stdout and stderr web-viewable?
5. If possible, can you add the ability to search by “job name”
(every job that includes the search term) in the auger custom
job query website?
2. For more general requests:
1. better transparency for whether there are problems in the
system, such as heavy traffic due to users, broken disks, etc.
Could there be an email list/webpage for that information?
2. clarification of how 'priority' of jobs works between different
halls and users.
3. would it be possible for the system to auto-resubmit failed jobs
if the failure is on the side of the system (e.g., bad farm
nodes, temporary loss of connection)?
3. Additionally, ask for more space on cache disk?
There is a meeting tomorrow with SciComp personnel to go over the list.
Interested parties should attend.
1. Ask Richard about a new software information element in the REST
2. Meet to figure out magnetic field map handling using CCDB and
Resources. (David, Sean, Mark)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Halld-offline