Hi g12ers,<br><br>OK, I have started the process to run gflux on a per run basis. However, I have a concern about runs where there are few files. In this case, we are still loosing a large chunk of data -- it would be a 4% effect for a 10 file run, of which we have several. It seems silly to remove these good events from any analysis because of the flux calculation.<br>
<br>Also, more than half of our runs have one or two files which were unsuccessful when cooking during pass1. So we'll have to run gflux only on the files that were successful and keep a record of this somewhere which will complicate any analysis. Its not to say that they will never be cooked, just no one has taken the time to do it. If they do get cooked, then we would have to run gflux once again since we treat runs a whole. We could avoid this problem if we had gflux for each raw data file.<br>
<br>Is there no way we can modify gflux to be reliable on a per-file basis? Is it a matter of the scalar banks? If so, can we get the first scalar bank in the file and use that for the first part (up to the first 10 seconds) of the file?<br>
<br>Or if we are running gflux on a single file, I could have the job cache the previous data file, get the last scalar bank and use that for the beginning of the file. This would work out fine I think, but then I am not sure how to handle the A00 file of a run.<br>
<br>I understand the 10 seconds gflux cuts out at the beginning of a job, but I don't know why it cuts out the last 10 seconds.<br><br><div class="gmail_quote">On Mon, Apr 11, 2011 at 3:53 PM, Eugene Pasyuk <<a href="mailto:pasyuk@jlab.org">pasyuk@jlab.org</a>> wrote:<br>
<blockquote class="gmail_quote">
<div bgcolor="#ffffff" text="#000000">
<tt>Good news from Sandy. All farm nodes have large disks. So we
cans submit large chunks of data in a single job.</tt><br>
<br>
<tt>-Eugene</tt><br>
<br>
<br>
<tt>-------- Original Message --------</tt>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><tt>Subject:
</tt></th>
<td><tt>Re: farm local dsks</tt></td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><tt>Date:
</tt></th>
<td><tt>Mon, 11 Apr 2011 15:27:03 -0400</tt></td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><tt>From:
</tt></th>
<td><tt>Sandy Philpott <a href="mailto:sandy.philpott@jlab.org"><sandy.philpott@jlab.org></a></tt></td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><tt>To: </tt></th>
<td><tt>Eugene Pasyuk <a href="mailto:pasyuk@jlab.org"><pasyuk@jlab.org></a></tt></td>
</tr>
</tbody>
</table>
<br>
<br>
<br>
<tt>Hi Eugene,<br>
<br>
Most of the batch farm nodes have 500GB disks, except the newest
batch <br>
of farm11* systems which all have 1000GB (1TB) disks. I don't
think <br>
there's a max a job can request, so long as it physically fits
under <br>
these sizes. We are looking at ways to make sure the local disk
isn't an <br>
I/O bottleneck as it can be for very I/O intensive jobs.<br>
<br>
Sandy<br>
<br>
> Hi Sandy,<br>
><br>
> What is the size of the local disks on the batch farm nodes
these days <br>
> and what is the max a job can request for DISK_SPACE?<br>
><br>
> Thanks,<br>
><br>
> -Eugene<br>
><br>
</tt>
</div>
</blockquote></div><br><br clear="all"><br>-- <br>Johann T. Goetz, PhD.<br><a href="mailto:jgoetz@ucla.edu">jgoetz@ucla.edu</a><br>Nefkens Group, UCLA Dept. of Physics & Astronomy<br>Hall-B, Jefferson Lab, Newport News, VA<br>
Office: 757-269-5465 (CEBAF Center F-335)<br>Mobile: 757-768-9999<br><br>