[Lowq] Cooking Issue

Hovanes Egiyan hovanes.egiyan at gmail.com
Sat Mar 20 14:45:45 EDT 2010


Hi Lamiaa,

I noticed that "clasdb" MySQL database server was down for a while.
I think computer center is doing some maintenance work related to clasdb. 
But I am not  absolutely sure that it is the reason of those job crashes.

Hovanes.


Lamiaa El Fassi wrote:
> Hi,
>  
> Those are some cooking statistics:
>  
> Completed jobs: 688 
> Good jobs        : 249
> Crashed jobs   : 439
>  
> In the last hour, things are becoming worse. Below an example of time
> stamps summary of file clas_41470.A40 taken from auger web
> page. The log file of this job is showing the same segmentation fault 
> that
> I mentioned in my last email
>  
>
>
>       Time stamps
>
> Submitted: Mar 18, 2010 10:30:17 PM
> Cleared Dependencies: Mar 20, 2010 2:15:29 PM
> Started Copying Input Files: Mar 20, 2010 2:22:08 PM
> Started Executing: Mar 20, 2010 2:25:12 PM
> Started Copying Output Files: Mar 20, 2010 2:25:19 PM
> Completed: Mar 20, 2010 2:25:19 PM
>
> Best regards,
>
> Lamiaa
>
>                      
>
> ************************************************************
> *  Lamiaa El Fassi             email: elfassi at jlab.org 
> <mailto:elfassi at jlab.org>
> *  Research Associate @ Rutgers University          
> *  Phone: (757) 269-7011 // Fax: (757) 269-5703                
> *  Jefferson Lab., 12000 Jefferson Ave.                
> *  Suite# 4, MS 12H3
> *  Newport News, VA. 23606
> ************************************************************
>
>
> On Sat, Mar 20, 2010 at 12:32 PM, Lamiaa El Fassi <elfassi at jlab.org 
> <mailto:elfassi at jlab.org>> wrote:
>
>     Hi,
>      
>     Upon request I am reprocessing the elastic runs for the RTPC
>     calibration.
>     I have noticed that almost half of the jobs done until now crashed
>     during the
>     cooking. All these crashed jobs are showing "status:success & exit
>     code: 0"
>     in auger web page, but if I check their output log files I am finding 
>     "Segmentation fault & size of raw data equal 0"
>     This lack of getting the raw data can be caused by no enough space
>     in the farm
>     machine or something else? 
>     In the submission script of each job I am requesting 8 GB of disk
>     space which fulfills
>     the size requirement of the input and output files of each
>     processed job.
>     Is there any problem in the farm machine which may be causing that?
>      
>     Best regards,
>      
>     Lamiaa
>
>     ************************************************************
>     *  Lamiaa El Fassi             email: elfassi at jlab.org
>     <mailto:elfassi at jlab.org>
>     *  Research Associate @ Rutgers University          
>     *  Phone: (757) 269-7011 // Fax: (757) 269-5703                
>     *  Jefferson Lab., 12000 Jefferson Ave.                
>     *  Suite# 4, MS 12H3
>     *  Newport News, VA. 23606
>     ************************************************************
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lowq mailing list
> Lowq at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lowq



More information about the Lowq mailing list