[Halld-offline] Fwd: [Jlab-scicomp-briefs] Scientific Computing Farm Maintenance Tuesday, May 26th 2020

Mark Ito marki at jlab.org
Fri May 22 08:57:50 EDT 2020


Please note Bryan's section on

   Notes about the new default /farm_out that may affect your batch jobs

which you have all received. This is a significant change. It is being 
done so that stdout can go to a non-Lustre disk (SSD) with better 
performance for that kind of usage, but with limited space.

   -- Mark

-------- Forwarded Message --------
Subject: 	[Jlab-scicomp-briefs] Scientific Computing Farm Maintenance 
Tuesday, May 26th 2020
Date: 	Thu, 21 May 2020 19:07:22 +0000
From: 	Bryan Hess <bhess at jlab.org>
To: 	jlab-scicomp-briefs at jlab.org <jlab-scicomp-briefs at jlab.org>

On Tuesday, May 26th at 8am the farm will be paused for software 
maintenance work and the ifarm machines will be rebooted. Once the farm 
nodes have been updated, jobs will be released and normal operations 
will resume. No action is required your part, but please see the note 
below about /farm_out

The following changes will be made:

  * All farm and ifarm machines will be upgraded to newer Lustre clients
  * Auger, SWIF, and other systems that rely on Lustre will be shifted
    to newer Lustre servers to improve performance and remove an
    operational dependency with the old Lustre system.
  * The /farm_out default location for farm job standard output and
    standard error will be moved from Lustre to a new file server better
    suited to the small IO operations of job logging.

Notes about the new default /farm_out that may affect your batch jobs:

  * standard output and standard error in /farm_out will be limited to
    1GB per user. Jobs that overrun this limit will still run, but their
    stdout/stderr will be truncated.
  * files older than 10 days will be automatically removed.
  * /farm_out files that are large may be compressed or have the middles
    trimmed to conserve space.

Reminder: End of life for CentOS 7.2/ Please run on CentOS 7.7

  * Jobs submitted using the "centos7" or "centos72" tags will soon be
    unsupported. Please migrate your jobs to use "centos77" or
    "general". The final 7.2 nodes from 2012 are being operated on a
    best-effort basis and will be taken out of service to make space for
    new hardware as needed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20200522/171629e9/attachment.html>
-------------- next part --------------
Jlab-scicomp-briefs mailing list
Jlab-scicomp-briefs at jlab.org

More information about the Halld-offline mailing list