[Lowq] Cooking Issue
Stepan Stepanyan
stepanya at jlab.org
Sun Mar 21 08:47:49 EDT 2010
Hi all,
During saturday's session of collaboration meeting, Dennis
reported that machine that holds clasdb and wiki start having
a problem last couple of days. They are looking into the problem.
Stepan
Hovanes Egiyan wrote:
> Hi Lamiaa,
>
> I noticed that "clasdb" MySQL database server was down for a while.
> I think computer center is doing some maintenance work related to clasdb.
> But I am not absolutely sure that it is the reason of those job crashes.
>
> Hovanes.
>
>
> Lamiaa El Fassi wrote:
>
>> Hi,
>>
>> Those are some cooking statistics:
>>
>> Completed jobs: 688
>> Good jobs : 249
>> Crashed jobs : 439
>>
>> In the last hour, things are becoming worse. Below an example of time
>> stamps summary of file clas_41470.A40 taken from auger web
>> page. The log file of this job is showing the same segmentation fault
>> that
>> I mentioned in my last email
>>
>>
>>
>> Time stamps
>>
>> Submitted: Mar 18, 2010 10:30:17 PM
>> Cleared Dependencies: Mar 20, 2010 2:15:29 PM
>> Started Copying Input Files: Mar 20, 2010 2:22:08 PM
>> Started Executing: Mar 20, 2010 2:25:12 PM
>> Started Copying Output Files: Mar 20, 2010 2:25:19 PM
>> Completed: Mar 20, 2010 2:25:19 PM
>>
>> Best regards,
>>
>> Lamiaa
>>
>>
>>
>> ************************************************************
>> * Lamiaa El Fassi email: elfassi at jlab.org
>> <mailto:elfassi at jlab.org>
>> * Research Associate @ Rutgers University
>> * Phone: (757) 269-7011 // Fax: (757) 269-5703
>> * Jefferson Lab., 12000 Jefferson Ave.
>> * Suite# 4, MS 12H3
>> * Newport News, VA. 23606
>> ************************************************************
>>
>>
>> On Sat, Mar 20, 2010 at 12:32 PM, Lamiaa El Fassi <elfassi at jlab.org
>> <mailto:elfassi at jlab.org>> wrote:
>>
>> Hi,
>>
>> Upon request I am reprocessing the elastic runs for the RTPC
>> calibration.
>> I have noticed that almost half of the jobs done until now crashed
>> during the
>> cooking. All these crashed jobs are showing "status:success & exit
>> code: 0"
>> in auger web page, but if I check their output log files I am finding
>> "Segmentation fault & size of raw data equal 0"
>> This lack of getting the raw data can be caused by no enough space
>> in the farm
>> machine or something else?
>> In the submission script of each job I am requesting 8 GB of disk
>> space which fulfills
>> the size requirement of the input and output files of each
>> processed job.
>> Is there any problem in the farm machine which may be causing that?
>>
>> Best regards,
>>
>> Lamiaa
>>
>> ************************************************************
>> * Lamiaa El Fassi email: elfassi at jlab.org
>> <mailto:elfassi at jlab.org>
>> * Research Associate @ Rutgers University
>> * Phone: (757) 269-7011 // Fax: (757) 269-5703
>> * Jefferson Lab., 12000 Jefferson Ave.
>> * Suite# 4, MS 12H3
>> * Newport News, VA. 23606
>> ************************************************************
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lowq mailing list
>> Lowq at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/lowq
>>
>
> _______________________________________________
> Lowq mailing list
> Lowq at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lowq
>
More information about the Lowq
mailing list