<div dir="ltr"><div class="gmail_quote"><br><div dir="ltr">Hi Data Challengers,<div><br></div><div>For what it's worth, I'm getting job times of roughly 9 - 12 hours including the lowest setting of EM background that Kei tried. I'll work on having some more stats for this Friday's meeting.</div>
<div><br></div><div>The success rate is up, so thanks to Paul, Simon, et al. for reducing memory usage =) I'm still having too many jobs crash at the beginning of REST creation, so I am tracking down the cause of that.</div>
<div><br></div><div><br></div><div><br></div><div>Cheers,<br>Sean</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Feb 25, 2014 at 8:42 AM, Mark Ito <span dir="ltr"><<a href="mailto:marki@jlab.org" target="_blank">marki@jlab.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Justin, et al.,<br>
<br>
Overnight Sunday to Monday, using the new tag I saw similar results.<br>
1,000 jobs, 50 k events each, no crashes. This time I did increase the<br>
memory request to 6 GB from 3 GB and the time request to 24 hours from<br>
15 hours. Still no EM background. But about 60 of them had short REST<br>
files (there are some ambiguous cases very close to the nominal file<br>
length). The short REST file thing seems to be the next frontier.<br>
<br>
I am not counting the DTrackFitterKalmanSIMD.cc message as a crash<br>
anymore. Simon tells me that is more like a warning.<br>
<br>
One question I have is if there is some standard way to gracefully punt<br>
on an event. We should keep careful accounting of that in each job of<br>
course. On the other hand, there is this:<br>
<a href="http://grantland.com/features/grantland-channel-coach-never-punts/" target="_blank">http://grantland.com/features/grantland-channel-coach-never-punts/</a><br>
<br>
On the length of a job, I would prefer to go to 24-hour jobs (assuming<br>
that there is a high probability of them finishing of course). Fewer<br>
output files that way. My understanding is that on the grid this is too<br>
long when we are in opportunistic mode. The longer you run in that case,<br>
the higher your chance of being pre-empted. Perhaps we should have<br>
different run times at the different sites.<br>
<span><font color="#888888"><br>
-- Mark<br>
</font></span><div><div><br>
On 02/25/2014 08:59 AM, Justin Stevens wrote:<br>
> Hi Data Challengers,<br>
><br>
> FYI, I used these tagged versions for a test on the MIT Openstack cluster of 100 jobs completed with 25K events each using the "standard" EM background that Kei and Paul have also used (10^7 g/s and +/- 200 ns gate). With this configuration I don't see any crashes, but 2 jobs finish with a small REST output file size which only contains a fraction of the 25K events. I think Paul also found this earlier where JANA reports that it processed the full number of events, but the REST file size is too small. As a reminder these jobs are deployed with 8 parallel processes running on an 8-core blade which share 14GB of memory. So they're not as sensitive to the memory spikes as some of the other tests we've seen.<br>
><br>
> One other thought on job length. Since the EM background results in much more time spent in hdgeant, should we lower the number of events per job to keep the job length from getting too large? These 25K event jobs take 6 hours here, but this will only go up if we extend the gate or up the rate for the EM background.<br>
><br>
> -Justin<br>
><br>
> On Feb 23, 2014, at 6:01 PM, Mark Ito wrote:<br>
><br>
>> correction, see below<br>
>> On 02/22/2014 09:18 PM, Mark Ito wrote:<br>
>>> Folks,<br>
>>><br>
>>> I changed the data challenge 2 sim-recon and hdds branches to get the<br>
>>> latest versions of everything, this time including src/libraries/TOF.<br>
>>> This will bring in Paul's recent changes to ameliorate the memory issues<br>
>>> we have been fighting with. The new hdds is necessary for the new TOF<br>
>>> geometry.<br>
>>><br>
>>> New tagged versions reflect the branch a week ago:<br>
>>><br>
>>> tags/hdds-dc-2.0<br>
>>> tags/hdds-dc-2.1<br>
>> that last line should have been: tags/sim-recon-dc-2.1<br>
>>> As before, the branch has not been tested yet, but the word should go<br>
>>> out whenever it changes. Let me know if you see weirdness.<br>
>>><br>
>>> -- Mark<br>
>>><br>
>> _______________________________________________<br>
>> Halld-offline mailing list<br>
>> <a href="mailto:Halld-offline@jlab.org" target="_blank">Halld-offline@jlab.org</a><br>
>> <a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
<br>
</div></div><div>--<br>
Mark M. Ito, Jefferson Lab, <a href="mailto:marki@jlab.org" target="_blank">marki@jlab.org</a>, <a href="tel:%28757%29269-5295" value="+17572695295" target="_blank">(757)269-5295</a><br>
<br>
</div><div><div>_______________________________________________<br>
Halld-offline mailing list<br>
<a href="mailto:Halld-offline@jlab.org" target="_blank">Halld-offline@jlab.org</a><br>
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br>
</div></div></blockquote></div><br></div>
</div></div></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Sean Dobbs<br>Department of Physics & Astronomy <br>Northwestern University<br>phone: 847-467-2826</div></div>