[Halld-offline] Offline Software Meeting MInutes, October 16, 2013
Mark M. Ito
marki at jlab.org
Thu Oct 17 11:51:20 EDT 2013
Folks,
Find the mintes below and at
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013#Minutes
.
-- Mark
__________________________________________________________
GlueX Offline Meeting, October 16, 2013
Minutes
A [29]recording of the meeting (audio and slides) is available at for a
month or so.
Present:
* CMU: Paul Mattione, Curtis Meyer
* IU: Kei Moriya, Matt Shepherd
* JLab: Mark Dalton, Hovanes Egiyan, Mark Ito (chair), David
Lawrence, Simon Taylor, Elliott Wolin, Beni Zihlmann
* Northwestern: Sean Dobbs
* UConn: Alex Barnes
Announcements
The work disk at JLab got full yesterday. Collectively, we deleted 1 TB
and 2 TB more should be added by the Computer Center today, bringing
the total to 12 TB.
Elliott told us there will a much larger amount of disk space available
in the counting house soon, before it is needed for online task.
Review of minutes from the last meeting
We went over the [30]minutes from September 18.
* The decay chain reporting issue last mentioned at the collaboration
meeting has an interim solution from Mark and Beni using a new
attribute for the product element in the hddm_s data model (on a
branch). A seamless genealogy with generations allowed both in the
generator (bggen and others) and the detector simulation (hdgeant)
is reported out. They are working on a tweak that will eliminate
the need for that new attribute; when finished it will appear on
the trunk.
* Simon discovered a bug in hdgeant that was causing the single-track
reconstruction test to fail part-way through the job. It had to do
with the code keeping track of secondary vertices. The next run
should have the full complement of events.
+ Mark pointed out that the change to the decay chain reporting
(see above) would re-do this portion of the code completely.
Simon's change is welcome nonetheless, for the interim.
Software Review Planning
Curtis led us through the preparations thus far. All relevant
information is collected on a [31]wiki page, linked from the [32]main
reviews page.
We had an [33]initial planning meeting last Friday. We blocked out the
talks and topics for the review and identified speakers. Another
meeting is planned for tomorrow.
Curtis is also preparing a [34]comprehensive document for distribution
to the committee in advance of the review. We will be able to describe
progress made since the last review in detail here; many topics will
have to be omitted or touched on only lightly in the oral presentations
because of time. He used the analogous document from the last review as
a starting point.
Data Challenge 2
Mark has run two more mini-DCs since the collaboration meeting (1000
jobs each).
The first of these showed the same calibration database problem as the
one reported on at the collaboration meeting. He was able to catch the
database server in the act of non-responsiveness. It turns out to be a
memory problem on the server, occurs when many jobs are running at the
same time and a large data set, such as the magnetic field, is
requested.
The problem is solved in the second mini-DC by using the SQLite version
of the database. This is a server-less, file-based system and is does
not suffer from the memory limitation. In addition, this is a
convenient solution for distributing calibration information to remote
sites for mass processing; a single file encapsulates all needed
calibration constants.
Remarks on this issue:
* The magnetic field is not a good fit for a relational database.
This incident illustrates the problem. It is only in the database
as a natural consequence of evolution from a standard
directory-tree/file-based system.
* David mentioned that there is a feature in JANA that will dump all
calibration constants used in a job to a local file system. Those
files could then be distributed for use by others. This would also
by-pass the need for each job to connect to a database server.
* One component of the problem is a persistent database connection
for all running jobs for the duration of the job. Each running job
therefore consumes memory on the server. The CCDB has a feature
under development to close these connections after a suitable time
and re-connect later if necessary. That would also solve the
problem and will be implemented.
* We have already done most of the development on using "resources"
in JANA. This is a system for caching large files (like say,
magnetic field maps), locally as needed. Full implementation of
this feature would also have avoided this problem. This also is
still in the plan.
* It also possible to increase the memory limit on the server, but
now that seems moot.
Now that this problem looks solved, the next task is to concentrate on
the few percent of jobs that fail due to other reasons. More mini-DCs
are to come.
Vertex Smearing
Kei showed [35]slides outlining the methods we use to randomize the
location of the primary interaction vertex in our simulation code and
proposed that we unify the scheme in a single code location, his
preference being hdgeant.
There are different smearing schemes for the different event
generators. David pointed out that the only place to have single scheme
is indeed in hdgeant since the particle gun, for example, is internal
to hdgeant. He volunteered to look into adding this feature. It would
be optional, controlled by the control.in file.
Addition of this feature would have to be re-done for the Geant4
version of hdgeant, but that should not be a huge problem.
Retrieved from
"[36]https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013"
References
29. https://cc.readytalk.com/play?id=d9hgmq
30.
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_18,_2013#Minutes
31. https://halldweb1.jlab.org/wiki/index.php/2013_Software_Review
32. https://halldweb1.jlab.org/wiki/index.php/Reviews
33.
https://halldweb1.jlab.org/wiki/index.php/SoftwareReview_October_11#Minutes
34.
http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2350
35.
https://halldweb1.jlab.org/wiki/images/8/80/2013-10-16-offline-vertex.pdf
36.
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013
--
Mark M. Ito, Jefferson Lab, (757)269-5295, marki at jlab.org
More information about the Halld-offline
mailing list