Hi Eugene,<br>The trip files are already to go (thanks Diane!) and so far I am only making a skim of the raw data with the proper banks and scalar events. If the scalar events are written every 10 seconds and gflux only drops the first part of the file then I would expect the average data lost to be 5 seconds and not 10.<br>

<br>Also, I thought the DAQ pushes all the events to the ET ring and another process just fills the raw data files from this ring. No truncation is done as far as I knew -- CODA doesn&#39;t know when a new file is starting to be written.<br>

<br>Could I run gflux on three consecutive files and have it only output the flux of the middle file? Would this solve our problem and allow us to keep the flux on a file by file basis?<br><br><div class="gmail_quote">On Tue, Apr 12, 2011 at 11:54 AM, Eugene Pasyuk &lt;<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a>&gt; wrote:<br>

<blockquote class="gmail_quote">Johann,<br>

perhaps you are too fast to start running it on a mass scale. We need to tailor and correct trip files properly before we proceed.<br>

<br>

If we proceed with entire run at once we drop ONLY the very first interval in A00 file. And it has to be dropped anyways because events there have junk and it is usually less than 10 seconds. The very last interval in the very last file of the run is never dropped. In that respect you don&#39;t lose anything. Where did you get this 4%?<br>


And it is not 10 seconds at the beginning and at the end of the file. It is on average total of 10 sec per file.<br>

The reason for dropping the first and the last intervals in a file is because they are truncated and we don&#39;t know exact livetime duration of those truncated pieces.<br>

<br>

-Eugene<div><div></div><div class="h5"><br>

<br>

On 4/12/11 11:34 AM, Johann Goetz wrote:<br>

</div></div><blockquote class="gmail_quote"><div><div></div><div class="h5">

Hi g12ers,<br>

<br>

OK, I have started the process to run gflux on a per run basis. However,<br>

I have a concern about runs where there are few files. In this case, we<br>

are still loosing a large chunk of data -- it would be a 4% effect for a<br>

10 file run, of which we have several. It seems silly to remove these<br>

good events from any analysis because of the flux calculation.<br>

<br>

Also, more than half of our runs have one or two files which were<br>

unsuccessful when cooking during pass1. So we&#39;ll have to run gflux only<br>

on the files that were successful and keep a record of this somewhere<br>

which will complicate any analysis. Its not to say that they will never<br>

be cooked, just no one has taken the time to do it. If they do get<br>

cooked, then we would have to run gflux once again since we treat runs a<br>

whole. We could avoid this problem if we had gflux for each raw data file.<br>

<br>

Is there no way we can modify gflux to be reliable on a per-file basis?<br>

Is it a matter of the scalar banks? If so, can we get the first scalar<br>

bank in the file and use that for the first part (up to the first 10<br>

seconds) of the file?<br>

<br>

Or if we are running gflux on a single file, I could have the job cache<br>

the previous data file, get the last scalar bank and use that for the<br>

beginning of the file. This would work out fine I think, but then I am<br>

not sure how to handle the A00 file of a run.<br>

<br>

I understand the 10 seconds gflux cuts out at the beginning of a job,<br>

but I don&#39;t know why it cuts out the last 10 seconds.<br>

<br>

On Mon, Apr 11, 2011 at 3:53 PM, Eugene Pasyuk &lt;<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a><br></div></div><div class="im">

&lt;mailto:<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a>&gt;&gt; wrote:<br>

<br>

    Good news from Sandy. All farm nodes have large disks. So we cans<br>

    submit large chunks of data in a single job.<br>

<br>

    -Eugene<br>

<br>

<br>

    -------- Original Message --------<br>

    Subject:    Re: farm local dsks<br>

    Date:       Mon, 11 Apr 2011 15:27:03 -0400<br>

    From:       Sandy Philpott &lt;<a href="mailto:sandy.philpott@jlab.org">sandy.philpott@jlab.org</a>&gt;<br></div><div class="im">

    &lt;mailto:<a href="mailto:sandy.philpott@jlab.org">sandy.philpott@jlab.org</a>&gt;<br>

    To:         Eugene Pasyuk &lt;<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a>&gt; &lt;mailto:<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a>&gt;<br>

<br>

<br>

<br>

<br>

    Hi Eugene,<br>

<br>

    Most of the batch farm nodes have 500GB disks, except the newest batch<br>

    of farm11* systems which all have 1000GB (1TB) disks. I don&#39;t think<br>

    there&#39;s a max a job can request, so long as it physically fits under<br>

    these sizes. We are looking at ways to make sure the local disk<br>

    isn&#39;t an<br>

    I/O bottleneck as it can be for very I/O intensive jobs.<br>

<br>

    Sandy<br>

<br>

     &gt; Hi Sandy,<br>

     &gt;<br>

     &gt; What is the size of the local disks on the batch farm nodes these<br>

    days<br>

     &gt; and what is the max a job can request for DISK_SPACE?<br>

     &gt;<br>

     &gt; Thanks,<br>

     &gt;<br>

     &gt; -Eugene<br>

     &gt;<br>

<br>

<br>

<br>

<br>

--<br>

Johann T. Goetz, PhD.<br>

</div><a href="mailto:jgoetz@ucla.edu">jgoetz@ucla.edu</a> &lt;mailto:<a href="mailto:jgoetz@ucla.edu">jgoetz@ucla.edu</a>&gt;<div class="im"><br>

Nefkens Group, UCLA Dept. of Physics &amp; Astronomy<br>

Hall-B, Jefferson Lab, Newport News, VA<br>

Office: <a href="tel:757-269-5465">757-269-5465</a> (CEBAF Center F-335)<br>

Mobile: <a href="tel:757-768-9999">757-768-9999</a><br>

<br>

</div></blockquote>

</blockquote></div><br><br clear="all"><br>-- <br>Johann T. Goetz, PhD.<br><a href="mailto:jgoetz@ucla.edu">jgoetz@ucla.edu</a><br>Nefkens Group, UCLA Dept. of Physics &amp; Astronomy<br>Hall-B, Jefferson Lab, Newport News, VA<br>


Office: 757-269-5465 (CEBAF Center F-335)<br>Mobile: 757-768-9999<br><br>