<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>What happens if file 50000.hddm has one configuration, file 50001.hddm has a different configuration, the two are skimmed for an analysis, and then the two (very small) skimmed files are merged together?  Maybe we don't do this if we use EventStore, but stripped-down files would save disk space, and then merging them could be convenient just so that "ls" commands don't take forever trying to read a folder with ~100k files.  </div><div><br></div><div>The header would have to allow for multiple configurations per file then.  Maybe just using a database is better?  </div><div><br></div><div>For my own curiosity (and to make things more confusing just for fun) what happens if someone decides to grab data using a factory tag that is event dependent?  This might be ridiculous (I'm not planning on doing it), but what if you have several slow pions that cause the track reconstruction code to make 40 tracks, and in that case you (the user performing the analysis) wanted to use a custom DChargedTrackHypothesis factory (for that event only) to weed out the junk?  Maybe then you REALLY want this to be in a database...</div><div><br></div><div> - Paul</div><br><div><div>On Jul 18, 2012, at 6:26 PM, Mark M. Ito wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

  
    <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">

  
  <div bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Richard and David,<br>

      <br>

      Now that Richard has convinced me that statelessness is to be

      respected, I will make the following point: if we put the config

      info into an opaque string, then there will be the temptation to

      pull back the curtain and use the information to influence the way

      we process the subsequent data. Just sayin'...<br>

      <br>

        -- Mark<br>

      <br>

      On 07/18/2012 04:16 PM, Richard Jones wrote:<br>

    </div>

    <blockquote cite="mid:500719AE.9020108@uconn.edu" type="cite">

      <div class="moz-cite-prefix">Hello list,<br>

        <br>

        <blockquote type="cite">  I think storing the config info as a

          string either in the first event or somewhere else in the

          header of the file using the existing mechanisms already built

          into HDDM is still a good idea and not just because of

          redundancy with the database. </blockquote>

        <br>

        Ok, can do.  How about just an opaque string?  Would that do?  I

        would want to limit its length to something, say 30kB or so. 

        Would that make sense?  The format and meaning of the string

        would be opaque to the analysis framework, and one would not

        need to make the hddm stream stateful.  It would be like a

        decoration that an application could print out in its log, and a

        human reader could browse for reference, but the individual

        analysis threads would not be expected to have deterministic

        access to it, right?<br>

        <br>

        -Richard J.<br>

        <br>

        <br>

        On 7/18/2012 3:34 PM, David Lawrence wrote:<br>

      </div>

      <blockquote cite="mid:50070FAC.8010208@jlab.org" type="cite"> <br>

        Hi Richard,<br>

        <br>

          I think storing the config info as a string either in the

        first event or somewhere else in the header of the file using

        the existing mechanisms already built into HDDM is still a good

        idea and not just because of redundancy with the database. One

        could imagine having a file and wanting to extract the

        configuration used to make it in order to view or use it. I see

        this as being very analogous to how the HDDM schema is stored in

        the front of the HDDM file and there is a tool to pull it out if

        needed. So to answer your devil's advocate question with

        another: Why not put the hddm schema in a database and not keep

        it in the file?<br>

        <br>

        Regards,<br>

        -David<br>

        <br>

        On 7/18/12 3:09 PM, Richard Jones wrote:

        <blockquote cite="mid:50070A05.5050400@uconn.edu" type="cite">Hello,


          <br>

          <br>

          I have no objection to storing a string tag for each object,

          representing the GetTag() string from jana.  That can be done

          either on an event-by-event basis or globally.  Event-by-event

          should only be adopted if the analysis can handle the

          situation where tags switch dynamically within a job, or we

          want to store more than one tag (say both default and "KLOE"

          bcal clusters) and let the user decide which to use.  That

          would require changes to the current DEventSourceREST.cc, but

          would be easy to do.  If tags are stored globally, then the

          hddm system will ensure automatically that only streams with

          the same tag strings get merged together as a result of a skim

          or by hddm-cat.  It would also provide a better way for the

          danarest plugin to decide which tag to use for each output

          object, instead of the provisional way I am handling it right

          now for DBCALShower objects, which David points out is

          incorrect in some cases. <br>

          <br>

          As to the idea of flooding the REST file header with analysis

          qualifiers, that is not something that hddm can do right now. 

          I could add the capability, but I question why.  The only

          function of the hddm header, as currently conceived, is to

          document to the hddm toolkit how to unpack the event data and

          what their meaning and relationships are. That is all it

          does.  It is not a place to record random comments like the

          name of the application that wrote the file, or the command

          line switches.  User code does not normally even access the

          header, it is just handled by the hddm library.  So at

          present, storing runconfig-type information would require

          adding special events to the stream, AND the huge change of

          making hddm streams stateful.... <br>

          <br>

          Just like root trees, hddm streams designed to be stateless. 

          This is an important design feature that I am not eager to

          concede. Think about trying to stick config-type information

          into a root tree, and then analyze it with a TSelector on

          PROOF.  You are going to have to do major gymnastics to get

          that information to every analysis session that gets started

          to run your job.  Building single-threaded concepts like this

          into the analysis sounds like we are still working like we did

          20 years ago. <br>

          <br>

          It was not my original intent to embed metadata about the

          conditions of the production inside the file, because I want

          later to be able to string these events together and create

          skims.  In general I want to avoid "stateful" streams in hddm,

          relying instead on the global keys like runnumber,eventnumber

          to reference database records for this information, similar to

          how root trees work.  By keeping the streams stateless I avoid

          all kinds of ordering and synchronization issues.  A related

          issue is the "skip to event NNN" action, which is very fast in

          hddm because you don't have to read in every event. Imagine a

          sparse skim, which in the limit would consist of one state

          record for every event.  Do I stop and check every time I hit

          a state record, do a bit-by-bit comparison with the current

          state, and throw an exception on incompatible changes?  Much

          better to check that compatibility ahead of time, from a

          database lookup, wouldn't it? <br>

          <br>

          If we want to make sure you don't lose the metadata, why not

          store them in two separate databases, or promote the database

          to a higher data security level?  We are not going to be able

          to analyze a REST file without access to a database (required

          to re-swim reference trajectories).  Playing devil's advocate,

          why are we not storing the magnetic field map in the event

          file?  Without the magnetic field we cannot get back the

          original DTrackTimeBased objects. <br>

          <br>

          -Richard J. <br>

          <br>

          <br>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <br>

          <pre wrap="">_______________________________________________

Halld-offline mailing list

<a moz-do-not-send="true" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>

<a moz-do-not-send="true" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>

        </blockquote>

      </blockquote>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Halld-offline mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>

<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>

    </blockquote>

    <br>

    <br>

  </div>


_______________________________________________<br>Halld-offline mailing list<br><a href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><br>https://mailman.jlab.org/mailman/listinfo/halld-offline</blockquote></div><br></body></html>