[Halld-offline] Offline Software Meeting MInutes, October 16, 2013

Mark M. Ito marki at jlab.org
Thu Oct 17 11:51:20 EDT 2013


Folks,

Find the mintes below and at

https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013#Minutes 
.

   -- Mark
__________________________________________________________

GlueX Offline Meeting, October 16, 2013
Minutes

    A [29]recording of the meeting (audio and slides) is available at for a
    month or so.

    Present:
      * CMU: Paul Mattione, Curtis Meyer
      * IU: Kei Moriya, Matt Shepherd
      * JLab: Mark Dalton, Hovanes Egiyan, Mark Ito (chair), David
        Lawrence, Simon Taylor, Elliott Wolin, Beni Zihlmann
      * Northwestern: Sean Dobbs
      * UConn: Alex Barnes

Announcements

    The work disk at JLab got full yesterday. Collectively, we deleted 1 TB
    and 2 TB more should be added by the Computer Center today, bringing
    the total to 12 TB.

    Elliott told us there will a much larger amount of disk space available
    in the counting house soon, before it is needed for online task.

Review of minutes from the last meeting

    We went over the [30]minutes from September 18.
      * The decay chain reporting issue last mentioned at the collaboration
        meeting has an interim solution from Mark and Beni using a new
        attribute for the product element in the hddm_s data model (on a
        branch). A seamless genealogy with generations allowed both in the
        generator (bggen and others) and the detector simulation (hdgeant)
        is reported out. They are working on a tweak that will eliminate
        the need for that new attribute; when finished it will appear on
        the trunk.
      * Simon discovered a bug in hdgeant that was causing the single-track
        reconstruction test to fail part-way through the job. It had to do
        with the code keeping track of secondary vertices. The next run
        should have the full complement of events.
           + Mark pointed out that the change to the decay chain reporting
             (see above) would re-do this portion of the code completely.
             Simon's change is welcome nonetheless, for the interim.

Software Review Planning

    Curtis led us through the preparations thus far. All relevant
    information is collected on a [31]wiki page, linked from the [32]main
    reviews page.

    We had an [33]initial planning meeting last Friday. We blocked out the
    talks and topics for the review and identified speakers. Another
    meeting is planned for tomorrow.

    Curtis is also preparing a [34]comprehensive document for distribution
    to the committee in advance of the review. We will be able to describe
    progress made since the last review in detail here; many topics will
    have to be omitted or touched on only lightly in the oral presentations
    because of time. He used the analogous document from the last review as
    a starting point.

Data Challenge 2

    Mark has run two more mini-DCs since the collaboration meeting (1000
    jobs each).

    The first of these showed the same calibration database problem as the
    one reported on at the collaboration meeting. He was able to catch the
    database server in the act of non-responsiveness. It turns out to be a
    memory problem on the server, occurs when many jobs are running at the
    same time and a large data set, such as the magnetic field, is
    requested.

    The problem is solved in the second mini-DC by using the SQLite version
    of the database. This is a server-less, file-based system and is does
    not suffer from the memory limitation. In addition, this is a
    convenient solution for distributing calibration information to remote
    sites for mass processing; a single file encapsulates all needed
    calibration constants.

    Remarks on this issue:
      * The magnetic field is not a good fit for a relational database.
        This incident illustrates the problem. It is only in the database
        as a natural consequence of evolution from a standard
        directory-tree/file-based system.
      * David mentioned that there is a feature in JANA that will dump all
        calibration constants used in a job to a local file system. Those
        files could then be distributed for use by others. This would also
        by-pass the need for each job to connect to a database server.
      * One component of the problem is a persistent database connection
        for all running jobs for the duration of the job. Each running job
        therefore consumes memory on the server. The CCDB has a feature
        under development to close these connections after a suitable time
        and re-connect later if necessary. That would also solve the
        problem and will be implemented.
      * We have already done most of the development on using "resources"
        in JANA. This is a system for caching large files (like say,
        magnetic field maps), locally as needed. Full implementation of
        this feature would also have avoided this problem. This also is
        still in the plan.
      * It also possible to increase the memory limit on the server, but
        now that seems moot.

    Now that this problem looks solved, the next task is to concentrate on
    the few percent of jobs that fail due to other reasons. More mini-DCs
    are to come.

Vertex Smearing

    Kei showed [35]slides outlining the methods we use to randomize the
    location of the primary interaction vertex in our simulation code and
    proposed that we unify the scheme in a single code location, his
    preference being hdgeant.

    There are different smearing schemes for the different event
    generators. David pointed out that the only place to have single scheme
    is indeed in hdgeant since the particle gun, for example, is internal
    to hdgeant. He volunteered to look into adding this feature. It would
    be optional, controlled by the control.in file.

    Addition of this feature would have to be re-done for the Geant4
    version of hdgeant, but that should not be a huge problem.

Retrieved from
"[36]https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013"

References

   29. https://cc.readytalk.com/play?id=d9hgmq
   30. 
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_September_18,_2013#Minutes
   31. https://halldweb1.jlab.org/wiki/index.php/2013_Software_Review
   32. https://halldweb1.jlab.org/wiki/index.php/Reviews
   33. 
https://halldweb1.jlab.org/wiki/index.php/SoftwareReview_October_11#Minutes
   34. 
http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2350
   35. 
https://halldweb1.jlab.org/wiki/images/8/80/2013-10-16-offline-vertex.pdf
   36. 
https://halldweb1.jlab.org/wiki/index.php/GlueX_Offline_Meeting,_October_16,_2013

-- 
Mark M. Ito, Jefferson Lab, (757)269-5295, marki at jlab.org




More information about the Halld-offline mailing list