[Halld-offline] Offline Software Meeting Minutes, January 21, 2015

Fri Jan 23 18:58:05 EST 2015

Folks,

Find the minutes below and at 
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_January_21,_2015 
.

   -- Mark
_______________________________

GlueX Offline Meeting Minutes, January 21, 2015

Present:

  * *CMU*: Curtis Meyer
  * *FIU*: Mahmoud Kamel
  * *FSU*: Aristeidis Tsaris
  * *JLab*: Alex Barnes, Mark Ito (chair), David Lawrence, Paul
    Mattione, Kei Moriya, Eric Pooser, Simon Taylor, Beni Zihlmann
  * *NU*: Sean Dobbs

Announcements

 1. Our volatile disk was expanded recently. The reservation increased
    from 10 to 20 TB, and the quota from 30 to 50 TB. We are using just
    over 20 TB presently.
 2. Marty Wise of Computing and Network Infrastructure (CNI) is working
    on installing the Run Conditions Database (RCDB) on an Apache server.
 3. CNI now has a desktop version of RedHat Enterprise Linux 7 available
    for beta testers. See Kelvin Edwards for an install image.
 4. Our work disk filled up this morning. We have 14 TB at present.
    Volunteers deleting their files have got it down to 75% used now.
 5. Mark remarked that we should review our long-term requests for disk
    space and see if we can start to expand our disk portfolio in a
    significant way.

Review of Minutes from January 7

We went over the minutes 
<https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_January_7,_2015#Minutes>. 
Items were either resolved or appear on the agenda for this meeting.

Data Challenge 3

Mark has successfully run test jobs going all the way from event 
generation to REST file production. Along the way EVIO-formatted data is 
produced and read. He showed some statistics 
<https://halldweb1.jlab.org/wiki/index.php/Software_Review_January_16,_2015#Agenda> 
about the test jobs presented at the last Software Review Preparation 
Meeting. The next step is to scale the jobs to real-challenge size. We 
hope to be in production by the time of the Software Review.

Software Review Preparations

Curtis reviewed the discussion we had at last Friday's Software Review 
Preparations Meeting 
<https://halldweb1.jlab.org/wiki/index.php/Software_Review_January_16,_2015>. 
We spent some time answering questions from Graham about out needs 
vis-a-vis the schedule for computer procurements. We also ran down a 
list of talking points and topics 
<https://halldweb1.jlab.org/wiki/index.php/Topics_for_the_2015_Software_Review> 
that we plan to present.

Commissioning Run Review

Offline Monitoring Report

Kei gave the report.

  * He ran over all files (online plugins, 2-track EVIO skim, REST) 2
    weeks ago
  * Next launch of the entire process is this Friday.
  * The group will be testing EventStore to mark events. This will take
    some dedicated disk space.
  * Kei showed slides
    <https://halldweb1.jlab.org/wiki/images/b/b5/2015-01-21-multithread.pdf>,
    giving an update on CentOS65 use and multi-thread processing.

Commissioning-Branch-to-Trunk Migration

Simon reported that he and Mark have started working on migration of 
code developed on the commissioning branch during the run to the trunk 
in the source code repository. An initial attempt have version that 
compiled and ran, but when the b1pi test was run with the code, no 
successful kinematic fits were produced.
Paul asked if the Monte Carlo variation was being used; it was not. This 
will be tried next.

Analysis of REST File Data

Justin reported that he has had success reproducing his recent 
bump-hunting plots starting from REST formatted data. This mode would 
allow users to pursue similar studies without having to fetch the data 
from tape and perform reconstruction; a big time savings. He did this 
with a private version of the code. There is currently and issue with 
unpacking tagger hits from the REST file. Hopefully this can be fixed 
before the next generation of REST files are produced.

Handling Changing Magnetic Field Setting

Quoting from a recent email from Sean:

One of the bigger headaches of running over the fall data was keeping 
track of all of the different magnetic field conditions, as the field 
went up and down. It would be user-friendly if we could keep track of 
this information as well, instead of forcing the user to specify the 
correct magnetic field map on the command line every time. Naively, I'd 
think that we could add a CCDB table that stored the name of the 
magnetic field map to use, i.e., that same information that would be 
passed in on the command line. Maybe this information is better stored 
as geometry or something else, though?

Mark remarked that we did have a plan for handling this problem using 
the CCDB and JANA Resources. David, Sean, and Mark will get together 
offline to revisit the plan.

Data Management

Quoting from the same email from Sean, three items:

Storing software information in REST files

Since we're storing information on the software conditions used for 
reconstruction, it might be nice to store some of this information in 
the "officially" created REST files themselves, for a certain amount of 
self-documentation.

Mark thought that it should be possible to add a "software version" 
element to the rest format, independent of the physics events, at the 
beginning of the file. Paul will ask Richard Jones about how this might 
be done.

EVIO format definition for Level 3 trigger farm

Is running the L3 trigger farm a goal of the spring running? If so, it 
would be useful to define the EVIO output format that would be used. I 
seem to remember that even if we run in pass-through mode, the L3 farm 
could be used to disentangle multi-block EVIO events, and output them in 
single-block format.

David remarked that disentangling was fundamental in the L3 design and 
any output format from L3, would be in single-blocked form.

EventStore: implementation plan

One thing that could save the amount of disk space needed for handling 
skims would be the EventStore DB, the development of which I've taken 
back up. However, the user would still need access to these files, so it 
would only help for people running over the data at JLab. So in the end, 
there might be still be a desire for us to make these files, for those 
who want to analyze the files at their home institutions.

The exact model we will use has not been decided on. Mark thought that 
to first order we would try to distribute at least the REST formatted 
data to each institution and therefore each site could have a functional 
EventStore-based system. This should be do-able for early running at least.

Requests to SciComp on Farm Features

Kei led us through a set of questions and feature requests he sent to 
SciComp. These were collected from the group working on offline monitoring.

 1. Tools to track jobs:
     1. tools to track what percentage of nodes were being used by whom
        at a given time, preferably in both # of jobs and threads.We can
        see the pie charts for example in
        http://scicomp.jlab.org/scicomp/#/auger/usage but would like the
        information in a form that we can easily access and analyze.
     2. what % of nodes are currently available for each OS at a given time
     3. tools to track the life time of each stage of the job, such as
        sitting in queue, waiting for files from tape, running, etc.
     4. Would it be possible to make the stdout and stderr web-viewable?
     5. If possible, can you add the ability to search by “job name”
        (every job that includes the search term) in the auger custom
        job query website?
 2. For more general requests:
     1. better transparency for whether there are problems in the
        system, such as heavy traffic due to users, broken disks, etc.
        Could there be an email list/webpage for that information?
     2. clarification of how 'priority' of jobs works between different
        halls and users.
     3. would it be possible for the system to auto-resubmit failed jobs
        if the failure is on the side of the system (e.g., bad farm
        nodes, temporary loss of connection)?
 3. Additionally, ask for more space on cache disk?

There is a meeting tomorrow with SciComp personnel to go over the list. 
Interested parties should attend.

Action Items

 1. Ask Richard about a new software information element in the REST
    format. (Paul)
 2. Meet to figure out magnetic field map handling using CCDB and
    Resources. (David, Sean, Mark)

Retrieved from 
"https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_January_21,_2015"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-offline/attachments/20150123/0bc0b8f9/attachment.html