[Jlab-scicomp-briefs] Compute Farm Updates

Bryan Hess bhess at jlab.org
Fri Oct 22 11:46:59 EDT 2021

Upcoming Changes to the Scientific Computing Environment

On Tuesday Morning, October 26th the following changes will be made to the farm:

  *   ifarm1801 will be patched from CentOS 7.7 to CentOS 7.9 and rebooted.
  *   all farm13 nodes (but only farm13 nodes) will be patched from CentOS 7.7 to CentOS 7.9 and rebooted.
  *   OSG tools will be updated on 7.9 nodes.

Updates to 7.9 are prerequisites for CUDA and Open Science Grid tools. We are increasing the frequency of patching to improve cybersecurity and be more respond to the grid computing software updates.

Recent Changes to the Scientific Computing Environment

  *   On the October 19th Maintenace day, GPU enabled machines (the sciml nodes) were updated to CentOS 7.9 to support CUDA 11.4.2.
  *   JupyterHub was similarly updated to include CUDA 11.4.2, Tensorflow 2.6.0, and Torch 1.9.1+cu111

If you experience any software incompatibilities in the transition from 7.7 to 7.9, please submit a ServiceNow request. Barring unseen issues, we anticipate incrementally rolling the rest of the farm to CentOS 7.9 over the next month. As we do, Slurm will be updated to reflect the state of worker nodes as described in the next section.

Slurm Specific Notes

Most batch jobs run in the production partition, which is the default. Information about other Slurm capabilities, including GPU and Jupyter Notebook use can be found online here: https://scicomp.jlab.org/docs/farmsDocHome

During this transition from CentOS 7.7 to 7.9 all nodes will remain in the production partition in Slurm. In the unlikely case that you need to limit your jobs to a particular set of nodes, you can submit using the "--constraint" flag and a feature (such as centos77) as shown below. The features will be updated and can be examined using the "sinfo" command. For example, this shows the number of nodes in the production partition and their features:

ifarm1901 ~ $ sinfo --partition=production -o "%8D %12P %f"
107      production*  farm19,amd,centos77,general
82       production*  farm14,centos77,general,xeon
40       production*  farm16,centos77,general,xeon
16       production*  farm13,centos77,general,xeon
81       production*  farm18,centos77,general,xeon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/attachments/20211022/3f0873d7/attachment.html>

More information about the Jlab-scicomp-briefs mailing list