[Halld-offline] data challenge green light

Richard Jones richard.t.jones at uconn.edu
Thu Dec 6 01:14:19 EST 2012


Mark,

The only way to know if we are ready is to start production and find out what hits the fan.  I pushed out a small batch this morning just to test the framework.  Things have changed on several sites since Igor last ran on the grid (software areas have moved, policies have changed here and there) so I am tweaking things to get them to run everywhere, but events are flowing.

I started production this morning at 7:30 with a batch of 1000 jobs.  As of 18:00 hours this afternoon all but a few stragglers were done, and 46 million rest events (920 files of 50k events each) have landed on the srm at UConn.  Log files all look good. It seems that the software is more robust than I feared.  Not a single job has crapped out due to memory ceiling violations so far. None of the rest files has exactly the same size, which means that we didn't goof up with our random numbers in the most obvious way.

I stopped production at 18:00 to implement changes in a couple of scripts, and just launched again, this time a batch of 500 million events.  That should keep the pipe full until the weekend.  Then it is time to do the real challenge -- 5 billion events in one batch job!

One small change I wish we had made is to get rid of the ticker tape of how many events have been processed so far.  This is nice for interactive running, but in batch mode it fills up the log file with very little purpose, and makes browsing the log file a pain.  A quick glance at JApplication did not show an obvious way to turn it off with a command line option, so I created a cron job that scans the log files as they come back from the grid and compresses out the ticker tape lines.  Saves an order of magnitude in space in my log directory.

-Richard J.


On 12/5/2012 4:13 PM, Paul Mattione wrote:
> Things are fine on my end.  I still have an ~8% failure rate though (due to RAM limitations).  13.2 million events and counting.
>
>   - Paul
>
> On Dec 5, 2012, at 4:02 PM, Mark M. Ito wrote:
>
>> OK. Have not heard from Richard but that's OK...
>>
>> I was wondering if there are any decisions left or technical questions or concerns at this point. Otherwise I think we can start. I'm assuming that Richard is done with the changes to the branch and Paul thinks that they are fine. I'll make a tag by tomorrow morning if I do not hear that that is a bad idea.
>>
>> If either of you think we really do need a conference call, let me know and I will set one up.
>>
>> On 12/05/2012 02:01 PM, Mark M. Ito wrote:
>>> Richard and Paul,
>>>
>>> Can we have a quick, touch-base-like phone meeting on the data challenge
>>> status today? Say at 4 pm. Or suggest a time. We can use the ReadyTalk
>>> thing.
>>>
>>>    -- Mark
>>>
>> -- 
>> Mark M. Ito
>> Jefferson Lab
>> marki at jlab.org
>> (757)269-5295


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3232 bytes
Desc: S/MIME Cryptographic Signature
Url : https://mailman.jlab.org/pipermail/halld-offline/attachments/20121206/22d79b7a/attachment.bin 


More information about the Halld-offline mailing list