[G12] Subject: Re: CCPR 105653 UPDATE (Running jobs on nodes have different behavior ) )
Michael C. Kunkel
mkunkel at jlab.org
Sat Oct 4 21:53:01 EDT 2014
Greetings,
Here is what I have surmised thus far.
1) The behavior on different nodes is still a mystery.
2) The error has to do with the RUN 10 parameters of
calib_user.RunIndexg12_mk, specifically the system SC_CALIBRATIONS.
3) This error was not present prior to August, in fact multiple users
had used the calib_user.RunIndexg12_mk table for billions of simulated,
equating to greater than 10,000 jobs on the farm, all successful previously.
4) I was able to make calib_user.RunIndexg12_mkIII, with all the
parameters of calib_user.RunIndexg12_mk, except for the system
SC_CALIBRATIONS, which is left to its default value. Running jobs on
this runindex does not result in error.
5)There is a difference in output from the simulations between the two
different runindexes, calib_user.RunIndexg12_mk and
calib_user.RunIndexg12_mkIII.
6) Noticing the behavior that the error is caused by one system in the
runindex leads to me question why some nodes will not throw fault error
while others will. Supposedly there is no difference in the builds, nor
the build of mysql on each node. What sets this error off in some nodes
and not others? Also, why did this occur after August 16 2014?
BR
MK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/g12/attachments/20141004/efed327d/attachment.html
More information about the G12
mailing list