[Jlab-scicomp-briefs] News and Notes for JLab Scientific Computing

Bryan Hess bhess at jlab.org
Mon Sep 30 16:17:16 EDT 2019


News and Notes for JLab Scientific Computing





On Tuesday morning, October 15th, the batch controller for the farm will be upgraded to the latest version  of Slurm (19.x). The change should be not be visible to end users. Jobs will continue to run during the upgrade. New farm job starts will be paused for approximately an hour while the server is relocated to a more robust virtual machine and the database is upgraded. This upgrade is being done for bug fixes, updates,  and to enable new scheduling capabilities. Please let us know if these plans pose any scheduling problems.





Starting this fall the following changes will be made to the scientific computing environment:

  *   40 new farm nodes, each with 64 CPU threads and 256GB of memory, will be added to the farm.
  *   The farm will be upgraded to CentOS 7.7. This will be done in a series of steps and more information about how to use the CentOS 7.7 nodes and the AMD cpus will follow.
  *   An addition ~6PB of space will be added to Lustre for /cache and /volatile to enable more online processing of data without making the round-trip to tape in as many cases.
  *   back-end management machines will be upgraded to provide more robust services





A number of changes have been made to the experimental physics batch farm. This is a short summary of recent changes, mostly related to the integration of slurm into the compute environment.



  *   The Farm web page has been updated to show both physical memory requested and physical memory allocated. This is useful to help understand whether the running jobs have optimized their use of memory and CPU resources. https://scicomp.jlab.org/scicomp/index.html#/farmNodes
  *   Fair Share for the farm is now calculated using the Fair Tree algorithm. The web pages have been updated to represent this visually at  https://scicomp.jlab.org/scicomp/index.html#/fairshare
  *   Slurm command-line tools are now installed on all farm nodes. As an example, the "squeue" and "sprio" commands can be used to show job status from the command line. "sbatch" is used to submit a job.
  *   for those users who want to try out using Slurm directly, the jlab slurm documentation is being developed at https://scicomp.jlab.org/docs/farm_slurm_batch

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/attachments/20190930/cd909bd8/attachment.html>


More information about the Jlab-scicomp-briefs mailing list