[Halld-offline] Offline Software Meeting Minutes, July 6, 2016

Thu Jul 7 12:34:20 EDT 2016

People,

Please find the minutes below and at https://goo.gl/5Q5b1S .

   -- Mark
________________________________________________________

    GlueX Offline Meeting, July 6, 2016, Minutes

You can view a recording of this meeting <https://bluejeans.com/s/9Xee/> 
on the BlueJeans site.

Present:

  * *CMU*: Naomi Jarvis, Curtis Meyer, Mike Staib
  * *FSU*: Brad Cannon
  * *GSI*: Nacer Hamdi
  * *JLab*: Alexander Austregesilo, Alex Barnes, Mark Ito (chair), David
    Lawrence, Paul Mattione, Justin Stevens, Simon Taylor
  * *NU*: Sean Dobbs
  * *Regina*: Tegan Beattie
  * *UConn*: Richard Jones

      Announcements

 1. *Intel Lustre upgrade*. Mark reminded us about the upgrade
    <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2016q2/000126.html>
    done a few weeks ago. Mark spoke with Dave Rackley earlier today
    July 6, 2016.
      * Lustre on servers was installed, an Intel version. Call support
        is available. We pay for it.
      * There were hangs after upgrade. After that the Intel Lustre
        client was installed on the ifarms. There have been no incidents
        since. Installs are still rolling out for farm and HPC nodes.
      * Please report issues if they are encountered.
 2. *New release: sim-recon 2.1.0*. This release
    <https://mailman.jlab.org/pipermail/halld-offline/2016-June/002387.html>
    came out about a month ago. A new should arrive this week.
 3. *REST backwards compatibility now broken*. Paul's email
    <https://mailman.jlab.org/pipermail/halld-physics/2016-May/000675.html>
    describes the situation. You cannot read old REST files with new
    sim-recon code.
 4. *Raw data copy to cache*. After some discussion with the Computer
    Center <https://halldweb.jlab.org/talks/2016/raw_data_to_cache.pdf>,
    we will now have the first files from each run appear on the cache
    disk without having to fetch them from the Tape Library.
 5. *New HDPM "install" command*. Nathan Sparks explains it in his email
    <https://mailman.jlab.org/pipermail/halld-offline/2016-May/002369.html>.
    It replaces the "fetch-dist" command.

      New wiki documentation for HDDM

Richard led us through his new wiki page 
<https://mailman.jlab.org/pipermail/halld-offline/2016-July/002408.html> 
which consolidates and updates documentation for the HDDM package. A new 
feature is a Python API for HDDM. Here is the table of contents:

     1 Introduction
     2 Templates and schemas
     3 How to get started
     4 HDDM in python
         4.1 writing hddm files in python
         4.2 reading hddm files in python
         4.3 advanced features of the python API
     5 HDDM in C++
         5.1 writing hddm files in C++
         5.2 reading hddm files in C++
         5.3 advanced features of the C++ API
     6 HDDM in c
         6.1 writing hddm files in c
         6.2 reading hddm files in c
         6.3 advanced features of the c API
     7 Advanced features
         7.1 on-the-fly compression/decompression
         7.2 on-the-fly data integrity checks
         7.3 random access to hddm records
     8 References

Some notes from the discussion:

  * If a lot of sparse single-event access is anticipated the zip format
    may be better because of the smaller buffer size. Bzip2 is the
    default now.
  * The random access features allows access to "bookmarks" for
    individual events that can be saved and used for quick access later,
    even for compressed files.
  * The Python API can be used in conjunction with PyROOT to write ROOT
    tree generators using any HDDM file as input quickly and economically.

        REST file I/O

Mike described a throughput limit he has seen for compressed REST data 
vs. non-compressed. See his slides 
<https://halldweb.jlab.org/wiki/images/c/c0/RestRates6Jul2016.pdf> for 
plots and details. The single-threaded HDDM reader limits scaling with 
the number of event analysis threads if it is reading compressed data. 
The curve turns over at about 6 or 7 threads. On the other hand, 
compressed data presents less load on disk-read bandwidth, and so 
multiple jobs contending for that bandwidth might do better with 
compressed data.

Richard agreed to buffer input and launch a user-defined number of 
threads to do HDDM input. That should prevent starvation of the event 
analysis threads.

      Review of minutes from June 8

We went over the minutes 
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_June_8,_2016#Minutes>. 

  * *Small files are still being retained on the cache disk*, without
    automatic archiving to tape. Mark will repeat his plea for small
    file deletion soon.
      o Alex A. pointed out that it is now possible to pin small files
        and to force a write to tape. That was not the case a couple of
        weeks ago.
      o Sean reminded us that we had put in a request for a get-and-pin
        command from jcache. Mark will check on status.
  * *RCDB is now fully integrated into the build_scripts system*. It is
    now built on the JLab CUE on nodes where C++11 features are
    supported. You can now incorporate your RCDB C++ API calls in
    sim-recon plugins and SCons will do the right thing build-wise as
    long as you have the RCDB_HOME environment defined properly.

      Spring 2016 Run Processing Status

        Distributing REST files from initial launch

Richard, Curtis, and Sean commented on the REST file distribution 
process. Matt Shepherd copied "all" of the files from JLab to IU and has 
pushed them to UConn, CMU, and Northwestern, as per his proposal 
<https://mailman.jlab.org/pipermail/halld-offline/2016-June/002390.html>, using 
Globus Online. He was able to get about 10 MB/s from JLab to IU. Similar 
speeds, within factors of a few, were obtained in the 
university-to-university transfers. All cautioned that one needs to 
think a bit carefully about network and networking hardware 
configurations to get acceptable bandwidth.

Alex A. cautioned us that there were some small files in Batch 1, and to 
a smaller extent in Batch 2, that either were lost before getting 
archived to tape, or that are in the Tape Library, but were not pinned 
and disappeared from the cache disk.

        Launch Stats

Alex pointed us to Launch Stats webpage 
<https://halldweb.jlab.org/data_monitoring/launch_analysis/index.html> 
that now contains links to the full reconstruction launches statistics 
page. We looked at the page for Batch 01 
<https://halldweb.jlab.org/data_monitoring/recon/summary_swif_output_recon_2016-02_ver01_batch01.html>. 

  * The page shows statistics on jobs run.
  * We discussed the plot of the number of jobs at each state of farm
    processing as a function of time. For the most part we were limited
    by the number of farm nodes, but there were times when we were
    waiting for raw data files from tape.
  * We never had more than about 500 jobs running at a time.
  * Memory usage was about 7 GB for Batch 1, a bit more for Batch 2.
  * The jobs ran with 14 threads.
  * One limit on farm performance was CLAS jobs which required large
    amounts of memory such that farm nodes were running with large
    fractions of idle cores.

      mcsmear and CCDB variation setting

David noticed last week that he had to set the choose the mc variation 
of CCDB to get sensible results from the FCAL when running mcsmear. This 
was because of change in the way the non-linear energy correct was being 
applied. He asked whether we want to make this the default for mcsmear 
since it is only run on simulated data.

The situation is complicated by the fact that not all simulated data 
should use the mc variation. That is only appropriate for getting the 
"official" constants intended for simulating data already in the can. 
Note that if no variation is specified at all, then the default 
variation is used; that was the problem that David discovered.

After some discussion, we decided to ask Sean to put in a warning in 
mcsmear if no variation is named at all.

      ROOT 6 upgrade?

Mark has done a test build of a recent version of our software with ROOT 
6. We had mentioned that we should transition from 5 to 6 once we have 
established use of a C++-11 compliant compiler. That has been done now.

Paul pointed out that the change may break some ROOT macros used by 
individuals, including some used for calibration. On the other hand the 
change has to happen at some time.

Mark told us he will not make the change for the upcoming release, but 
will consider it for the one after that. In any case will discuss it 
further.

Retrieved from 
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_July_6,_2016&oldid=76137"

  * This page was last modified on 7 July 2016, at 12:31.

-- 
marki at jlab.org, (757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20160707/cf4a1486/attachment.html>