<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Folks,<br>

    <br>

    Find the minutes below and at <a class="moz-txt-link-freetext" href="https://goo.gl/fMao1Y">https://goo.gl/fMao1Y</a> .<br>

    <br>

      -- Mark<br>

    _________________<br>

    <br>

    <div id="globalWrapper">

      <div id="column-content">

        <div id="content" class="mw-body" role="main">

          <h1 id="firstHeading" class="firstHeading" lang="en"><span

              dir="auto">GlueX Offline Meeting, May 25, 2016,</span>

            Minutes</h1>

          <div id="bodyContent" class="mw-body-content"><span

              class="mw-headline" id="Minutes"></span>

            <div id="mw-content-text" dir="ltr" class="mw-content-ltr"

              lang="en">

              <p>You can <a rel="nofollow" class="external text"

                  href="https://bluejeans.com/s/9JCK/">view a recording

                  of this meeting</a> on the BlueJeans site.

              </p>

              <p>Present:

              </p>

              <ul>

                <li> <b>CMU</b>: Curtis Meyer</li>

                <li> <b>FIU</b>: Mahmoud Kamel</li>

                <li> <b>IU</b>: Matt Shepherd</li>

                <li> <b>JLab</b>: Alexander Austregesilo, Amber

                  Boehnlein, Graham Heyes, Mark Ito (chair), David

                  Lawrence, Paul Mattione, Sandy Philpott, Nathan

                  Sparks, Justin Stevens, Adesh Subedi, Simon Taylor,

                  Chip Watson</li>

                <li> <b>MIT</b>: Christiano Fanelli</li>

                <li> <b>NU</b>: Sean Dobbs</li>

              </ul>

              <h3><span class="mw-headline"

id="Review_of_.5BGlueX_Offline_Meeting.2C_April_27.2C_2016.23Minutes.7Cminutes_from_April_27.5D.5D">Review

                  of [GlueX Offline Meeting, April 27,

                  2016#Minutes|minutes from April 27]]</span></h3>

              <ul>

                <li> Reminder: we switch to GCC greater or equal to 4.8

                  on June 1.</li>

                <li> Mark raised the question of whether corrections to

                  data values in the RCDB should properly be reported as

                  a software issue on GitHub. An alternate forum was not

                  proposed. We left it as something to think about.</li>

              </ul>

              <h3><span class="mw-headline"

                  id="Announcement:_Write-Through_Cache">Announcement:

                  Write-Through Cache</span></h3>

              <ul>

                <li> Mark reminded us about <a rel="nofollow"

                    class="external text"

href="https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2016q2/000124.html">Jie

                    Chen's email</a> announcing the switch-over to

                  write-through cache in favor of the read-only cache.

                  The change moves toward symmetrization of operations

                  between LQCD and ENP.</li>

                <li> Mark cautioned us that we should treat this as an

                  interface to the tape library, and not as infinite

                  "work" disk. This means we should not be writing many

                  small files to it and we need to think carefully about

                  the implied directory structure in the /mss tree that

                  will result from new directories created on the cache

                  disk.</li>

                <li> Chip pointed out that the write-through cache

                  facilitates writing data from off-site to the tape

                  library by allowing a pure disk access to do the job.</li>

                <li>The size threshold for writing to tape will be set

                  to zero. Other parameters will be documented on the

                  SciComp web pages.</li>

                <li> At some point we need to do a purge of small files

                  already existing on the cache. These will not be a

                  problem for a few months, so we have some time to get

                  around to it.</li>

                <li> There is an ability to write data immediately to

                  tape and optionally delete it from the cache.</li>

                <li> At some point in the future, Paul will give us a

                  talk on best-practices for using the write-through

                  cache.</li>

              </ul>

              <h3><span class="mw-headline"

                  id="Copying_data_to_off-site_locations">Copying data

                  to off-site locations</span></h3>

              <h4><span class="mw-headline" id="Data_Copying">Data

                  Copying</span></h4>

              <p>We went over the <a rel="nofollow" class="external

                  text"

href="https://mailman.jlab.org/pipermail/halld-offline/2016-May/thread.html#start">recent

                  email thread centered around bandwidth for

                  transferring data off-site (expected bandwidth

                  offsite)</a>.

              </p>

              <ul>

                <li> Matt had started the thread by asking what the

                  maximum possible transfer rate might be.</li>

                <li> Concern was expressed that if we truly wanted to go

                  as fast as possible, that might cause back-ups on-site

                  for data transfers related to farm operation.</li>

                <li> Numbers from Chip:

                  <ul>

                    <li> network pipe to the outside: 10 Gbit/s</li>

                    <li> tape bandwidth: 2 GByte/s</li>

                    <li> disk access: 10-20 GByte/s</li>

                  </ul>

                </li>

                <li> If collaborators were to try to pull data

                  (indirectly) from tape in a way that saturated the

                  network going out, that could use roughly half of the

                  tape bandwidth. However, that scenario is not very

                  likely. The interest is in REST data, and those data

                  are produced at a rate much, much less than the full

                  bandwidth of the tape system. We had agreed that an

                  average rate of 100 MB per second was sufficient for

                  the collaboration and easily provided given the

                  current configuration at the Lab. Special measures to

                  throttle use will probably not be necessary.</li>

                <li> Matt's offer of resources he gets for quasi-free

                  from IU could be used to create a staging site for

                  data off-site from which other collaborating

                  institutions can draw the data. This would off-load

                  much of the demand from the Lab potentially by a

                  factor of several.</li>

                <li> These points were largely settled in the course of

                  the email discussion.</li>

              </ul>

              <h4><span class="mw-headline"

                  id="Discussion_with_OSG_Folks">Discussion with OSG

                  Folks</span></h4>

              <p>Amber, Mark, and Richard Jones had a video-conference

                on Monday (May 23) with Frank Wuerthwein and Rob Gardner

                of the Open Science Grid to discuss strengthening

                OSG-related capabilities at JLab.

              </p>

              <ul>

                <li> Frank proposed working on two issues: job

                  submission from JLab and data transfer in and out of

                  the Lab using OSG resources and tools. Initially he

                  proposed getting the job submission going first, but

                  after hearing about our need to ship REST data

                  off-site, modified the proposal to give the efforts

                  equal weight.</li>

                <li> Richard was able to fill in the OSG guys on what

                  was already in place and what has been done in the

                  past with the GlueX Virtual Organization (VO).</li>

                <li> For data transfer, XROOTD was identified as a

                  likely technology to use despite the fact that REST

                  data is not ROOT based.</li>

                <li> Rob and Frank will go away and think about firming

                  up details of the proposal in both areas given their

                  understanding of our requirements. They will get back

                  to us when they have a formulation.</li>

              </ul>

              <h4><span class="mw-headline"

                  id="Future_Practices_for_Data_Transfer">Future

                  Practices for Data Transfer</span></h4>

              <p>Amber encouraged us to think about long-term solutions

                to the data transfer problem, rather than focusing

                exclusively on the "current crisis". In particular we

                should be thinking about solutions that are (a)

                appropriate for the other Halls and (b) that have robust

                support in the HENP community generally. The LHC

                community has had proven success in this area and we are

                in a good position to leverage their experience.

              </p>

              <p>To do this will require more detailed specification of

                requirements than we have done thus far as well as

                communication of such requirements among the Halls. With

                this planning, possible modest improvements in

                infrastructure and on-site support of activities can

                proceed with confidence.

              </p>

              <p>We all agreed that this was a pretty good idea.

              </p>

              <h3><span class="mw-headline"

                  id="Disk_space_requirements_for_Spring_2016_data">Disk

                  space requirements for Spring 2016 data</span></h3>

              <p>Sean discussed the disk space being taken up by skims

                from the recent data run. <a

href="https://halldweb.jlab.org/wiki/index.php/Calibration_Train#Run_Groups"

                  title="Calibration Train">See his table</a> for the

                numbers [FCAL and BCAL column headings should be

                reversed.]

              </p>

              <ul>

                <li> Currently, all of the pair spectrometer skims are

                  pinned. Files need to be present on the disk before a

                  Globus Online request for copy can be made against

                  them. Richard is trying to get them all to UConn.</li>

                <li> The FCAL and BCAL skims have likely been wiped off

                  the disk by the auto-deletion algorithm.

                  <ul>

                    <li> Both can be reduced in size by dropping some of

                      the tags. Sean has code to do that, but it needs

                      more testing before being put into production.</li>

                  </ul>

                </li>

                <li> The total size of all skims is about 10% of the raw

                  data.</li>

              </ul>

              <p>We discussed various aspects of managing our Lustre

                disk space. I looks like for this summer, we will need

                about 200 TB of space, counting volatile, cache, and

                work. The system is new to us, we have some learning to

                do before we can use it efficiently.

              </p>

              <h3><span class="mw-headline"

                  id="Spring_2016_Run.2C_Processing_Plans">Spring 2016

                  Run, Processing Plans</span></h3>

              <p>Paul noted that this was discussed fully at yesterday's

                Analysis Meeting. In summary we will produce:

              </p>

              <ul>

                <li> reconstructed data</li>

                <li> skims of raw data</li>

                <li> TTrees</li>

                <li> EventStore meta-data</li>

              </ul>

              <p>In additions job submission scripts are in need of some

                re-writing.

              </p>

              <h4><span class="mw-headline"

                  id="Meta-data_for_this_processing_launch">Meta-data

                  for this processing launch</span></h4>

              <p>Sean presented a system for classifying and documenting

                key aspects of the various data sets that we will have

                to handle. He guided us through a <a rel="nofollow"

                  class="external text"

href="https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/dataVersions.py">web

                  page he put together that displays the information</a>.

                It has a pull-down menu to choose the run period being

                queried. There is a legend at the bottom that describes

                each of the fields.

              </p>

              <p>This system has already been in use for the monitoring

                launch and the information is used in the monitoring web

                pages to navigate the different versions of those

                launches. Also the data will be used to correlate data

                sets with the EventStore. Sean is proposing that the

                data version string be written into each REST file so

                that there is a two-way link between EventStore meta

                data and the data itself.

              </p>

              <p>Mark suggested that we might want to have a more formal

                relational database structure for the information. This

                would require some re-writing of a working system but

                may be worth the effort.

              </p>

              <p>Sean has written <a rel="nofollow" class="external

                  text"

href="http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=3062">a

                  GlueX Note motivating and documenting the system</a>.

              </p>

              <h3><span class="mw-headline"

                  id="Negative_Parity_in_Tracking_code">Negative Parity

                  in Tracking code</span></h3>

              <p>David is working on new code to parse the EVIO data. In

                the course of that work he was comparing results between

                and old and new parser and noticed some small

                differences. <a

href="https://halldweb.jlab.org/wiki/images/f/fc/20160525_lawrence_tracking.pdf"

                  class="internal" title="20160525 lawrence

                  tracking.pdf">See his slides</a> for all of the

                details. He tracked these down to several issues. Some

                of them have to do with the new parser presenting the

                hits to the reconstruction code in a different order

                than that presented by the old parser.

              </p>

              <p>The issues were:

              </p>

              <ol>

                <li> In the track finding code, there is an assumption

                  about the order of hit, although that order was never

                  enforced in the code. That was causing a variance used

                  as a cut criterion used for hit inclusion on a track

                  to be calculated on different numbers of hits,

                  depending on order.</li>

                <li> In FDC pseudo hit creation the "first hit" on a

                  wire was chosen for inclusion. That choice was

                  manifestly hit-order dependent.</li>

                <li> There was a bug in the FDC Cathode clustering code

                  that caused multiple hits on a single cathode to be

                  split onto different clusters. That bug manifested

                  itself in a way that depended on hit order.</li>

                <li> For some events, a perfectly fine start counter hit

                  was ignored in determining the drift start time for a

                  track. That was a result of the reference trajectory

                  in the fringe field area being calculated using a

                  field that depended on the history of the reference

                  trajectory object's recycling history. In certain

                  cases a bad, left-over field value would give a

                  trajectory where the intersection of the track with

                  the projection of the start counter plane in the r-phi

                  view was unacceptably far downstream of the physical

                  start counter in z.</li>

              </ol>

              <p>These have all been fixed on a branch. David will look

                at moving the changes onto the master branch.

              </p>

              <h3><span class="mw-headline" id="HDGeant4_Workflow">HDGeant4

                  Workflow</span></h3>

              <p>At the collaboration meeting Mark talked to Richard

                about developing a [HDGeant4 news work flow for

                HDGeant4] that would allow early testing by

                collaborators as well as controlled contributions. Mark

                reported that they agreed to <a rel="nofollow"

                  class="external text"

href="https://docs.google.com/document/d/1bjMoszd5fJuiuh3DDJGgnTOnxFowObYSsP6ndVhOmF0/edit?usp=sharing">a

                  plan in principle</a> last week.

              </p>

              <p>Note that this work flow is not the same as that we

                have been using for sim-recon, in particular it will

                require contributors to compose pull requests based on a

                private clone of the HDGeant4 repository on GitHub

                rather than from a branch of the "Jefferson Lab"

                repository. Most collaborators will not have privilege

                to create such a branch on the JLab repository.

              </p>

            </div>

            <div class="printfooter">

              Retrieved from "<a dir="ltr"

href="https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_May_25,_2016&oldid=75392">https://halldweb.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_May_25,_2016&oldid=75392</a>"</div>

          </div>

        </div>

      </div>

      <div id="footer" role="contentinfo">

        <ul id="f-list">

          <li id="lastmod"> This page was last modified on 1 June 2016,

            at 12:20.</li>

        </ul>

      </div>

    </div>

    <br>

    <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-abbreviated" href="mailto:marki@jlab.org">marki@jlab.org</a>, (757)269-5295

</pre>

  </body>

</html>