<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">On Tuesday Feb 20, as a part of the scheduled monthly maintenance window, Slurm for
LQCD was upgraded to version 22. This was done in preparation for the 24s systems to support AlmaLinux 9. As part of pre-maintenance testing, the existing controller was cloned to a test system and the Slurm upgrade was vetted in a staging environment to
ensure that the maintenance day work was fully understood ahead of time. Unfortunately the test system interacted with the production system and caused jobs to fail, draining the queues. Normally controls prevent the test system from interacting with the production
system, and this is not the first time this approach has been used. We will examine the differences that led to the job failures. The Slurm upgrade proceeded without further issue and systems are now operating normally. </span></div>
<div id="Signature"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
</body>
</html>