[Lowq] Cooking Issue

Sun Mar 21 08:47:49 EDT 2010

Hi all,

During saturday's session of collaboration meeting, Dennis
reported that machine that holds clasdb and wiki start having
a problem last couple of days. They are looking into the problem.

Stepan

Hovanes Egiyan wrote:
> Hi Lamiaa,
>
> I noticed that "clasdb" MySQL database server was down for a while.
> I think computer center is doing some maintenance work related to clasdb. 
> But I am not  absolutely sure that it is the reason of those job crashes.
>
> Hovanes.
>
>
> Lamiaa El Fassi wrote:
>   
>> Hi,
>>  
>> Those are some cooking statistics:
>>  
>> Completed jobs: 688 
>> Good jobs        : 249
>> Crashed jobs   : 439
>>  
>> In the last hour, things are becoming worse. Below an example of time
>> stamps summary of file clas_41470.A40 taken from auger web
>> page. The log file of this job is showing the same segmentation fault 
>> that
>> I mentioned in my last email
>>  
>>
>>
>>       Time stamps
>>
>> Submitted: Mar 18, 2010 10:30:17 PM
>> Cleared Dependencies: Mar 20, 2010 2:15:29 PM
>> Started Copying Input Files: Mar 20, 2010 2:22:08 PM
>> Started Executing: Mar 20, 2010 2:25:12 PM
>> Started Copying Output Files: Mar 20, 2010 2:25:19 PM
>> Completed: Mar 20, 2010 2:25:19 PM
>>
>> Best regards,
>>
>> Lamiaa
>>
>>                      
>>
>> ************************************************************
>> *  Lamiaa El Fassi             email: elfassi at jlab.org 
>> <mailto:elfassi at jlab.org>
>> *  Research Associate @ Rutgers University          
>> *  Phone: (757) 269-7011 // Fax: (757) 269-5703                
>> *  Jefferson Lab., 12000 Jefferson Ave.                
>> *  Suite# 4, MS 12H3
>> *  Newport News, VA. 23606
>> ************************************************************
>>
>>
>> On Sat, Mar 20, 2010 at 12:32 PM, Lamiaa El Fassi <elfassi at jlab.org 
>> <mailto:elfassi at jlab.org>> wrote:
>>
>>     Hi,
>>      
>>     Upon request I am reprocessing the elastic runs for the RTPC
>>     calibration.
>>     I have noticed that almost half of the jobs done until now crashed
>>     during the
>>     cooking. All these crashed jobs are showing "status:success & exit
>>     code: 0"
>>     in auger web page, but if I check their output log files I am finding 
>>     "Segmentation fault & size of raw data equal 0"
>>     This lack of getting the raw data can be caused by no enough space
>>     in the farm
>>     machine or something else? 
>>     In the submission script of each job I am requesting 8 GB of disk
>>     space which fulfills
>>     the size requirement of the input and output files of each
>>     processed job.
>>     Is there any problem in the farm machine which may be causing that?
>>      
>>     Best regards,
>>      
>>     Lamiaa
>>
>>     ************************************************************
>>     *  Lamiaa El Fassi             email: elfassi at jlab.org
>>     <mailto:elfassi at jlab.org>
>>     *  Research Associate @ Rutgers University          
>>     *  Phone: (757) 269-7011 // Fax: (757) 269-5703                
>>     *  Jefferson Lab., 12000 Jefferson Ave.                
>>     *  Suite# 4, MS 12H3
>>     *  Newport News, VA. 23606
>>     ************************************************************
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lowq mailing list
>> Lowq at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/lowq
>>     
>
> _______________________________________________
> Lowq mailing list
> Lowq at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lowq
>