[Halld-offline] Offline Software Meeting Minutes, January 18, 2017
Mark Ito
marki at jlab.org
Thu Jan 19 19:59:25 EST 2017
Folks,
Find the minutes below and at
https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_January_18,_2017#Minutes
.
-- Mark
____________________
GlueX Offline Meeting, January 18, 2017, Minutes
Present:
* *FIU*: Mahmoud Kamel
* *JLab*: Alexander Austregesilo, Nathan Baltzell, Alex Barnes, Thomas
Britton, Brad Cannon, Mark Ito (chair), Nathan Sparks, Kurt
Strosahl, Simon Taylor, Beni Zihlmann
* *MIT*: Cristiano Fanelli
* *NU*: Sean Dobbs
* *UConn*: Richard Jones + 2
* *W&M*: Justin Stevens
There is arecording of this meeting <https://bluejeans.com/s/tgAip/>on
the BlueJeans site.
Announcements
1. *Backups of the RCDB database in SQLite*form are now being kept on
the write through cache, in /cache/halld/home/gluex/rcdb_sqlite/.
SeeMark's email
<https://mailman.jlab.org/pipermail/halld-offline/2017-January/002573.html>for
more details.
2. *Development of a wrapper for signal MC generation*. Thomas has
written scripts to wrap the basic steps of signal Monte Carlo
generation. One can specify the number of events, the .input file to
use for genr8, and jobs will be submitted via SWIF. Paul thought the
the average user would find this useful. Mark suggested that the
code could be version controlled with thehd_utilities repository
<https://github.com/JeffersonLab/hd_utilities>on GitHub.
3. *More Lustre space*. Mark reported that our total Lustre space has
been increased from a quota of 200 TB to 250 TB. Seehis email
<https://mailman.jlab.org/pipermail/halld-offline/2017-January/002594.html>for
a few more details.
Lustre system status
Kurt Strosahl, of JLab SciComp, dropped by to give us a report on the
recent problems with theLustre file system
<https://en.wikipedia.org/wiki/Lustre_%28file_system%29>. This has
affected our work, cache, and volatile directories. Lustre aggregates
multiple partitions on multiple raid arrays, "block devices" or "Object
Store Targets (OSTs), and presents to users a view of one large disk
partition. There are redundant metadata systems to keep track of which
files are where.
On New Year's Day, due to Infiniband problems, a fail-over from one
metadata system to the other was initiated mistakenly. In the confusion
both systems tried to mount a few of the OSTs causing corruption of the
metadata for five of the 74. This was the first time a fail-over has
occurred for a production system at JLab. Intel and SciComp have been
working together to recover the metadata. The underlying files appear to
be OK, but without the metadata they cannot be accessed. So far,
metadata for four of the five OSTs has been repaired and it appears that
their files have reappeared intact. This work has been going on for over
two weeks now; there is no definite estimate on when the last OST will
be recovered. Fail-over has been inhibited for now.
We asked Kurt about recent troubles with ifarm1102. That particular
nodes has been having issues with its Infiniband interface and has now
been removed from the rotation of ifarm machines.
Review of minutes from the last meeting
We went over theminutes from
December<https://halldweb.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_December_21,_2016#Minutes>(all)
* The problem with reading data with multiple threads turn out indeed
to be from corrupted data. There was an issue with the raid arrays
in the counting room.
* Sean has had further discussions on how we handle HDDS XML files.
There is a plan now.
* Mark will ask about getting us an update on the OSG appliance.
Launches
Alex A. gave the report.
2016-10 offline monitoring ver02
Alex wanted to start this before the break, but calibration were not
ready. Instead processing started the first week of January, but since
there was not a lot of data in the run, it finished in a week. The
gxproj1 account was used. There were some minor problems with the post
processing scripts that have now been fixed.
We are waiting for new calibration results before starting ver03,
perhaps sometime next week. There was an issue with the propagation
delay calibration for the TOF that has now been resolved and there are
on-going efforts with BCAL and FCAL calibrations. The monitoring launch
gives us a check on the quality of the calibrations for the entire
running period.
2016-02 analysis launch ver05
This launch started before the break. Jobs are running with only six
threads. Large variation in execution time and peak memory use has been
observed. The cause has been traced to a few channels what require many
photons (e. g., 3π^0 ) that can generate huge numbers of combinations
and stop progress on a single thread. Several solutions were discussed,
including re-writing parts of the analysis library and cutting off
processing for events that generate too many combinations. In addition,
in the future the list of plugins may get trimmed. This last effort took
the philosophy of running "everything" to see how many channels we can
reasonably get through.
Sim 1.2
Mark reported that the 50 k jobs that have been submitted are going
through very slowly. Processing started in the middle of the break and
is only 20% done and this batch is only 20% of the total we planned to
simulate. The processing time is dominated by the generation of
electromagnetic background independently for each event. After some
discussion of the purpose of the resulting data set, we decided to
re-launch the effort without generation of E&M background. The data
should still be useful for studying efficiency/acceptance.
HDGeant/HDGeant4 Update
Richard gave us an update on the development effort.
* He is doing a tag-by-tag comparison of the output from HDGeant ("G3"
for our purposes) and HDGeant4 ("G4"), comparing both truth and hit
information. For 90% of the discrepancies he finds it is the new G4
code that needs fixing, but the other 10% come from G3 errors,
mostly in truth information that is not looked at as often.
* To do the comparison he has developed a new tool, hddm-root, that
creates a ROOT tree auto-magically directly from an HDDM file. This
allows quick histogramming of quantities for comparison.
* Detectors where agreement has been verified: CDC, FDC, BCAL, FCAL,
TOF, tagger, pair spectrometer (course and fine), triplet polarimeter.
* The triplet polarimeter simulation was adopted from code from Mike
Dugger and was originally implemented in G4, but has also been
back-ported to G3.
* To test the TPOL simulation, a new card has been introduced,
GENB[?], that will track beam photons down the beamline from the
radiator. It has three modes: pre-coll, post-coll, and post-conv
which end tracking at the collimator, at the converter, and on
through to the TPOL respectively. The generated particle information
can be written out in HDDM format and serve as input to either G3 or
G4, just as for any other of our event generators.
* The coherent bremsstrahlung generator has been implemented in G4 and
compared to that of G3.
* "Fake" tagger hits are now being generated in G4 in the same manner
as was done in G3. Also a new tag RFTime[?] has been introduced. It
is a single time that sets the "true" phase of the RF used in the
simulation.
* Other detectors implemented: the DIRC, the MWPC (for the CPP
experiment), and for completeness the gas RICH, the gas Cerenkov,
and the UPV.
* The MCTRAJECTORY card has been implemented in G4 and its
implementation in G3 fixed. This allows output of position
information for particle birth, death, and/or points in between for
primary and/or secondary particles in a variety of combinations of
those items.
* The following G3 cards have been implemented in G4. The secretary
will refer the reader to the documentation in the sample control.in
for definitions of most of these.
o KINE
o SCAT
o TARGET
o BGGATE
o BGRATE
* The following cards will not be implemented in G4
o CUTS
o SWITCH
+ CUTS and SWITCH do not fit into the Geant4 design philosophy
o GELHAD
+ photonuclear interactions are now provided natively in Geant4
* The following cards are being implemented now:
o HADR
+ The meaning in G4 has been modified to control turning
on/off all hadronic interaction processes to save users the
bother of doing so one by one
o CKOV
o LABS
o NOSEC
o AUTO
o BFIELD_MAP
o PSFIELD_MAP
o SAVEHIT
o SHOWERS_IN_COLLIMATOR
o DRIFT_CLUSTERS
o MULS
o BREMS
o COMPT
o LOSS
o PAIR
o DECAY
o DRAY
Beni will describe the scheme he implemented in G3 to preserve the
identify of secondary particles and transmit the description to Richard.
Performance remains an issue but is not an area of focus at this stage.
Richard has seen a slow-down of a factor of 24 per thread going from G3
to G4. At this point G4 is generating two-orders of magnitude more
secondary particles, mostly neutrons, compared to G3. A simple kinetic
energy threshold adjustment did not make much of a difference.
Sean made a couple of comments:
1. The problem Richard discovered with missing TDC hits in the BCAL has
been traced to the generation of digi-hits for the BCAL. CCDB
constants had to be adjusted to bring those hits back.
2. Caution should be used with the current pair spectrometer field map
called for the in CCDB. It is only a preliminary rough guess.
Mark needs to create a standard build of G4.
Richard requested that if folks have problems, questions, or
suggestions, they should log anissue on GitHub
<https://github.com/rjones30/HDGeant4/issues>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170119/ed5bd0f4/attachment-0001.html>
More information about the Halld-offline
mailing list