[Halld-offline] Fwd: [Jlab-scicomp-briefs] Scientific Computing Farm Maintenance Tuesday, May 26th 2020
Mark Ito
marki at jlab.org
Fri May 22 08:57:50 EDT 2020
Folks,
Please note Bryan's section on
Notes about the new default /farm_out that may affect your batch jobs
which you have all received. This is a significant change. It is being
done so that stdout can go to a non-Lustre disk (SSD) with better
performance for that kind of usage, but with limited space.
-- Mark
-------- Forwarded Message --------
Subject: [Jlab-scicomp-briefs] Scientific Computing Farm Maintenance
Tuesday, May 26th 2020
Date: Thu, 21 May 2020 19:07:22 +0000
From: Bryan Hess <bhess at jlab.org>
To: jlab-scicomp-briefs at jlab.org <jlab-scicomp-briefs at jlab.org>
On Tuesday, May 26th at 8am the farm will be paused for software
maintenance work and the ifarm machines will be rebooted. Once the farm
nodes have been updated, jobs will be released and normal operations
will resume. No action is required your part, but please see the note
below about /farm_out
The following changes will be made:
* All farm and ifarm machines will be upgraded to newer Lustre clients
* Auger, SWIF, and other systems that rely on Lustre will be shifted
to newer Lustre servers to improve performance and remove an
operational dependency with the old Lustre system.
* The /farm_out default location for farm job standard output and
standard error will be moved from Lustre to a new file server better
suited to the small IO operations of job logging.
Notes about the new default /farm_out that may affect your batch jobs:
* standard output and standard error in /farm_out will be limited to
1GB per user. Jobs that overrun this limit will still run, but their
stdout/stderr will be truncated.
* files older than 10 days will be automatically removed.
* /farm_out files that are large may be compressed or have the middles
trimmed to conserve space.
Reminder: End of life for CentOS 7.2/ Please run on CentOS 7.7
* Jobs submitted using the "centos7" or "centos72" tags will soon be
unsupported. Please migrate your jobs to use "centos77" or
"general". The final 7.2 nodes from 2012 are being operated on a
best-effort basis and will be taken out of service to make space for
new hardware as needed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20200522/171629e9/attachment.html>
-------------- next part --------------
_______________________________________________
Jlab-scicomp-briefs mailing list
Jlab-scicomp-briefs at jlab.org
https://mailman.jlab.org/mailman/listinfo/jlab-scicomp-briefs
More information about the Halld-offline
mailing list