[G12] Subject: Re: CCPR 105653 UPDATE (Running jobs on nodes have different behavior ) )

Michael C. Kunkel mkunkel at jlab.org
Sat Oct 4 21:53:01 EDT 2014


Greetings,

Here is what I have surmised thus far.

1) The behavior on different nodes is still a mystery.

2) The error has to do with the RUN 10 parameters of 
calib_user.RunIndexg12_mk, specifically the system SC_CALIBRATIONS.

3) This error was not present prior to August, in fact multiple users 
had used the calib_user.RunIndexg12_mk table for billions of simulated, 
equating to greater than 10,000 jobs on the farm, all successful previously.

4) I was able to make calib_user.RunIndexg12_mkIII, with all the 
parameters of calib_user.RunIndexg12_mk, except for the system 
SC_CALIBRATIONS, which is left to its default value. Running jobs on 
this runindex does not result in error.

5)There is a difference in output from the simulations between the two 
different runindexes, calib_user.RunIndexg12_mk and 
calib_user.RunIndexg12_mkIII.

6) Noticing the behavior that the error is caused by one system in the 
runindex leads to me question why some nodes will not throw fault error 
while others will. Supposedly there is no difference in the builds, nor 
the build of mysql on each node. What sets this error off in some nodes 
and not others? Also, why did this occur after August 16 2014?


BR
MK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/g12/attachments/20141004/efed327d/attachment.html 


More information about the G12 mailing list