[Halld-offline] data challenge green light
David Lawrence
davidl at jlab.org
Thu Dec 6 07:41:46 EST 2012
-PJANA:BATCH_MODE=1
On 12/6/12 1:14 AM, Richard Jones wrote:
> Mark,
>
> The only way to know if we are ready is to start production and find
> out what hits the fan. I pushed out a small batch this morning just
> to test the framework. Things have changed on several sites since
> Igor last ran on the grid (software areas have moved, policies have
> changed here and there) so I am tweaking things to get them to run
> everywhere, but events are flowing.
>
> I started production this morning at 7:30 with a batch of 1000 jobs.
> As of 18:00 hours this afternoon all but a few stragglers were done,
> and 46 million rest events (920 files of 50k events each) have landed
> on the srm at UConn. Log files all look good. It seems that the
> software is more robust than I feared. Not a single job has crapped
> out due to memory ceiling violations so far. None of the rest files
> has exactly the same size, which means that we didn't goof up with our
> random numbers in the most obvious way.
>
> I stopped production at 18:00 to implement changes in a couple of
> scripts, and just launched again, this time a batch of 500 million
> events. That should keep the pipe full until the weekend. Then it is
> time to do the real challenge -- 5 billion events in one batch job!
>
> One small change I wish we had made is to get rid of the ticker tape
> of how many events have been processed so far. This is nice for
> interactive running, but in batch mode it fills up the log file with
> very little purpose, and makes browsing the log file a pain. A quick
> glance at JApplication did not show an obvious way to turn it off with
> a command line option, so I created a cron job that scans the log
> files as they come back from the grid and compresses out the ticker
> tape lines. Saves an order of magnitude in space in my log directory.
>
> -Richard J.
>
>
> On 12/5/2012 4:13 PM, Paul Mattione wrote:
>> Things are fine on my end. I still have an ~8% failure rate though
>> (due to RAM limitations). 13.2 million events and counting.
>>
>> - Paul
>>
>> On Dec 5, 2012, at 4:02 PM, Mark M. Ito wrote:
>>
>>> OK. Have not heard from Richard but that's OK...
>>>
>>> I was wondering if there are any decisions left or technical
>>> questions or concerns at this point. Otherwise I think we can start.
>>> I'm assuming that Richard is done with the changes to the branch and
>>> Paul thinks that they are fine. I'll make a tag by tomorrow morning
>>> if I do not hear that that is a bad idea.
>>>
>>> If either of you think we really do need a conference call, let me
>>> know and I will set one up.
>>>
>>> On 12/05/2012 02:01 PM, Mark M. Ito wrote:
>>>> Richard and Paul,
>>>>
>>>> Can we have a quick, touch-base-like phone meeting on the data
>>>> challenge
>>>> status today? Say at 4 pm. Or suggest a time. We can use the ReadyTalk
>>>> thing.
>>>>
>>>> -- Mark
>>>>
>>> --
>>> Mark M. Ito
>>> Jefferson Lab
>>> marki at jlab.org
>>> (757)269-5295
>
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20121206/0acad9cd/attachment-0002.html>
More information about the Halld-offline
mailing list