[Halld-offline] Offline Software Meeting Minutes, July 6, 2016
Mark Ito
marki at jlab.org
Thu Jul 7 12:34:20 EDT 2016
People,
Please find the minutes below and at https://goo.gl/5Q5b1S .
-- Mark
________________________________________________________
GlueX Offline Meeting, July 6, 2016, Minutes
You can view a recording of this meeting <https://bluejeans.com/s/9Xee/>
on the BlueJeans site.
Present:
* *CMU*: Naomi Jarvis, Curtis Meyer, Mike Staib
* *FSU*: Brad Cannon
* *GSI*: Nacer Hamdi
* *JLab*: Alexander Austregesilo, Alex Barnes, Mark Ito (chair), David
Lawrence, Paul Mattione, Justin Stevens, Simon Taylor
* *NU*: Sean Dobbs
* *Regina*: Tegan Beattie
* *UConn*: Richard Jones
Announcements
1. *Intel Lustre upgrade*. Mark reminded us about the upgrade
<https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2016q2/000126.html>
done a few weeks ago. Mark spoke with Dave Rackley earlier today
July 6, 2016.
* Lustre on servers was installed, an Intel version. Call support
is available. We pay for it.
* There were hangs after upgrade. After that the Intel Lustre
client was installed on the ifarms. There have been no incidents
since. Installs are still rolling out for farm and HPC nodes.
* Please report issues if they are encountered.
2. *New release: sim-recon 2.1.0*. This release
<https://mailman.jlab.org/pipermail/halld-offline/2016-June/002387.html>
came out about a month ago. A new should arrive this week.
3. *REST backwards compatibility now broken*. Paul's email
<https://mailman.jlab.org/pipermail/halld-physics/2016-May/000675.html>
describes the situation. You cannot read old REST files with new
sim-recon code.
4. *Raw data copy to cache*. After some discussion with the Computer
Center <https://halldweb.jlab.org/talks/2016/raw_data_to_cache.pdf>,
we will now have the first files from each run appear on the cache
disk without having to fetch them from the Tape Library.
5. *New HDPM "install" command*. Nathan Sparks explains it in his email
<https://mailman.jlab.org/pipermail/halld-offline/2016-May/002369.html>.
It replaces the "fetch-dist" command.
New wiki documentation for HDDM
Richard led us through his new wiki page
<https://mailman.jlab.org/pipermail/halld-offline/2016-July/002408.html>
which consolidates and updates documentation for the HDDM package. A new
feature is a Python API for HDDM. Here is the table of contents:
1 Introduction
2 Templates and schemas
3 How to get started
4 HDDM in python
4.1 writing hddm files in python
4.2 reading hddm files in python
4.3 advanced features of the python API
5 HDDM in C++
5.1 writing hddm files in C++
5.2 reading hddm files in C++
5.3 advanced features of the C++ API
6 HDDM in c
6.1 writing hddm files in c
6.2 reading hddm files in c
6.3 advanced features of the c API
7 Advanced features
7.1 on-the-fly compression/decompression
7.2 on-the-fly data integrity checks
7.3 random access to hddm records
8 References
Some notes from the discussion:
* If a lot of sparse single-event access is anticipated the zip format
may be better because of the smaller buffer size. Bzip2 is the
default now.
* The random access features allows access to "bookmarks" for
individual events that can be saved and used for quick access later,
even for compressed files.
* The Python API can be used in conjunction with PyROOT to write ROOT
tree generators using any HDDM file as input quickly and economically.
REST file I/O
Mike described a throughput limit he has seen for compressed REST data
vs. non-compressed. See his slides
<https://halldweb.jlab.org/wiki/images/c/c0/RestRates6Jul2016.pdf> for
plots and details. The single-threaded HDDM reader limits scaling with
the number of event analysis threads if it is reading compressed data.
The curve turns over at about 6 or 7 threads. On the other hand,
compressed data presents less load on disk-read bandwidth, and so
multiple jobs contending for that bandwidth might do better with
compressed data.
Richard agreed to buffer input and launch a user-defined number of
threads to do HDDM input. That should prevent starvation of the event
analysis threads.
Review of minutes from June 8
We went over the minutes
<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_June_8,_2016#Minutes>.
* *Small files are still being retained on the cache disk*, without
automatic archiving to tape. Mark will repeat his plea for small
file deletion soon.
o Alex A. pointed out that it is now possible to pin small files
and to force a write to tape. That was not the case a couple of
weeks ago.
o Sean reminded us that we had put in a request for a get-and-pin
command from jcache. Mark will check on status.
* *RCDB is now fully integrated into the build_scripts system*. It is
now built on the JLab CUE on nodes where C++11 features are
supported. You can now incorporate your RCDB C++ API calls in
sim-recon plugins and SCons will do the right thing build-wise as
long as you have the RCDB_HOME environment defined properly.
Spring 2016 Run Processing Status
Distributing REST files from initial launch
Richard, Curtis, and Sean commented on the REST file distribution
process. Matt Shepherd copied "all" of the files from JLab to IU and has
pushed them to UConn, CMU, and Northwestern, as per his proposal
<https://mailman.jlab.org/pipermail/halld-offline/2016-June/002390.html>, using
Globus Online. He was able to get about 10 MB/s from JLab to IU. Similar
speeds, within factors of a few, were obtained in the
university-to-university transfers. All cautioned that one needs to
think a bit carefully about network and networking hardware
configurations to get acceptable bandwidth.
Alex A. cautioned us that there were some small files in Batch 1, and to
a smaller extent in Batch 2, that either were lost before getting
archived to tape, or that are in the Tape Library, but were not pinned
and disappeared from the cache disk.
Launch Stats
Alex pointed us to Launch Stats webpage
<https://halldweb.jlab.org/data_monitoring/launch_analysis/index.html>
that now contains links to the full reconstruction launches statistics
page. We looked at the page for Batch 01
<https://halldweb.jlab.org/data_monitoring/recon/summary_swif_output_recon_2016-02_ver01_batch01.html>.
* The page shows statistics on jobs run.
* We discussed the plot of the number of jobs at each state of farm
processing as a function of time. For the most part we were limited
by the number of farm nodes, but there were times when we were
waiting for raw data files from tape.
* We never had more than about 500 jobs running at a time.
* Memory usage was about 7 GB for Batch 1, a bit more for Batch 2.
* The jobs ran with 14 threads.
* One limit on farm performance was CLAS jobs which required large
amounts of memory such that farm nodes were running with large
fractions of idle cores.
mcsmear and CCDB variation setting
David noticed last week that he had to set the choose the mc variation
of CCDB to get sensible results from the FCAL when running mcsmear. This
was because of change in the way the non-linear energy correct was being
applied. He asked whether we want to make this the default for mcsmear
since it is only run on simulated data.
The situation is complicated by the fact that not all simulated data
should use the mc variation. That is only appropriate for getting the
"official" constants intended for simulating data already in the can.
Note that if no variation is specified at all, then the default
variation is used; that was the problem that David discovered.
After some discussion, we decided to ask Sean to put in a warning in
mcsmear if no variation is named at all.
ROOT 6 upgrade?
Mark has done a test build of a recent version of our software with ROOT
6. We had mentioned that we should transition from 5 to 6 once we have
established use of a C++-11 compliant compiler. That has been done now.
Paul pointed out that the change may break some ROOT macros used by
individuals, including some used for calibration. On the other hand the
change has to happen at some time.
Mark told us he will not make the change for the upcoming release, but
will consider it for the one after that. In any case will discuss it
further.
Retrieved from
"https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_July_6,_2016&oldid=76137"
* This page was last modified on 7 July 2016, at 12:31.
--
marki at jlab.org, (757)269-5295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20160707/cf4a1486/attachment.html>
More information about the Halld-offline
mailing list