[Hps] [Jlab-scicomp-briefs] JLab Farm Upgrade to AlmaLinux 9
Laura Hild via Jlab-scicomp-briefs
jlab-scicomp-briefs at jlab.org
Tue May 14 10:25:17 EDT 2024
One week from now, on the upcoming CST Division monthly maintenance day,
Tuesday, May 21st, jobs which do not explicitly request `el7` will be
assigned to land on AlmaLinux 9. *Incompatible jobs will fail,* or,
depending on their other constraints, may fail to submit or run at all.
When we started the migration, we configured the batch system so that
submitted jobs would default to running exclusively on CentOS Linux 7,
in order to avoid intermittent failures with incompatible jobs landing
on the newly available AlmaLinux 9 nodes, while allowing users to begin
migrating their jobs by overriding the default. We must change that
default before the migration is complete, so that any remaining
incompatible jobs will begin to fail before users no longer have the
option of continuing to run by explicitly requesting CentOS Linux 7
while they work to make their jobs compatible with AlmaLinux 9.
See Knowledge Base Articles
https://jlab.servicenowservices.com/scicomp?id=kb_article&sysparm_article=KB0015330
which details how to select the desired operating system using Slurm
feature constraints, and
https://jlab.servicenowservices.com/scicomp?id=kb_article_view&sysparm_article=KB0015346
which contains information about changes between using EL7 and EL9 Farm
nodes that may affect your work. We intend for the Farm to be 90%
migrated this week, and to migrate the remaining 10% on the June
maintenance day, the 18th. You can see the migration progress at
https://scicomp.jlab.org/scicomp/nodeStatus/os
or with a command like
sinfo -ho'%f %N' | awk '$1=substr($1,0,3)' | sort
***Also of note, but unrelated to the operating system migration,***
all Farm GPU (sciml) nodes and a subset of Farm nodes (farm16) have been
reserved from 6am to 5pm on May 21st for a planned power outage that is
needed to install additional metering. Jobs which require those nodes
and whose TimeLimit stretches into the reservation period will remain
in the Pending state.
Please file a ServiceNow Incident at
https://jlab.servicenowservices.com/scicomp
with any questions or concerns.
________________________________________
Od: Bryan Hess <bhess at jlab.org>
Poslano: torek, 27. februar 2024 10:28
Za: jlab-scicomp-briefs at jlab.org
Zadeva: JLab Farm Upgrade to AlmaLinux 9
Farm Upgrade to AlmaLinux 9
The Jefferson Lab computing farm is being upgraded from CentOS 7 to AlmaLinux 9 in the coming months. This document outlines changes to the environment for all users of the interactive login nodes (ifarm), SLURM, and SWIF.
Farm Upgrade Schedule and Worker Node Selection
The farm is being upgraded in a series of steps. Between now and June, the farm composition will change from majority CentOS 7 to predominantly AlmaLinux 9. At the time of this writing, CentOS 7 is the default. This default will change at a later step in the conversion process. Users may currently select which nodes run their jobs using Slurm features/constraints. This article<https://jlab.servicenowservices.com/kb?id=kb_article_view&sysparm_article=KB0015330> provides details on feature-based node selection. SWIF can pass features through to Slurm. See the SWIF introduction<https://scicomp.jlab.org/docs/swif2> and SWIF command line reference<https://scicomp.jlab.org/cli/swif.html> for details.
The interactive (ifarm) nodes currently run CentOS 7. A new machine, ifarm9.jlab.org is available for AlmaLinux 9 use now. Two new ifarm machines that will run AlmaLinux 9 are on order. They will replace the existing ifarm machines and include more per-core memory and temporary disk space.
Software Environment and Filesystem Changes
The use of /apps is deprecated and is not available on farm AlmaLinux 9 machines. CVMFS is now used to distribute software. It is rooted under OASIS and can be used with modulefiles<https://jlab.servicenowservices.com/kb_view.do?sysparm_article=KB0014671> as before. For questions about software package availability, please submit a ServiceNow incident. For hall-specific software distribution questions, contact your computing coordinator<https://jlab.servicenowservices.com/kb?id=kb_article_view&sysparm_article=KB0014686>.
The legacy /site area has been removed. The path to Jasmine (tape) and cache tools will change from /site/bin to /usr/local/bin. The CUE /u/scratch area has also been removed.
Remote Access for Visual Studio Code and SSH
If you use Visual Studio Code for development and connect it to the ifarm hosts using the Remote-SSH extension, you may be aware that the current version has dropped support for CentOS 7 remote ends. Since AlmaLinux 9 has the newer system software required, you can use ifarm9 as the remote host instead. This guide to using SSH<https://jlab.servicenowservices.com/scicomp?id=kb_article&sysparm_article=KB0014918> with the farm includes VS Code details at the end.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/hps/attachments/20240514/0388d2ed/attachment.html>
-------------- next part --------------
--
This is an announcement-only list for Jefferson Lab Scientific Computing Updates .
Subscription and List Archive: https://mailman.jlab.org/mailman/listinfo/jlab-scicomp-briefs
For help: https://jlab.servicenowservices.com/scicomp
More information about the Hps
mailing list