[Halld-offline] MCwrapper-bot issues

Thomas Britton tbritton at jlab.org
Fri Aug 2 15:12:35 EDT 2019


Dear collaborators,

The last two weeks has seen MCwrapper-bot struggle to produce MC.  The reasons were two-fold;  the first affected the use of our singularity container.  Once remedied all submitted jobs began instantly failing at UConn. Since UConn typically makes up a large chunk of our OSG cycles the vast majority of jobs submitted failed.  There is a mechanism to "hold" jobs that fail more than a set amount of times.  Almost all jobs of all submitted projects triggered this condition. Being out of town I was unable to manually over-ride this switch and the decision was eventually made to blacklist UConn until remedied.  This allowed jobs to run although at a lower rate and only when manually prompted by myself. Richard, Edgar, and myself investigated and the problems seems to have been resolved (thanks to a patch by Edgar) as of Thursday late afternoon.  This delayed many projects for more than a week.  Many projects have been finished and there are a couple still slowly working their way through.

Sorry for the delay.

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20190802/a3a37b17/attachment.html>


More information about the Halld-offline mailing list