[Jlab-scicomp-briefs] Nov 19: Transition of the Farm /cache filesystem to read-only from farm and ifarm

Wesley Moore wmoore at jlab.org
Tue Nov 12 15:32:43 EST 2024


Scheduled Maintenance Announcement
Dear Users,
This is a reminder of upcoming scheduled maintenance for the Farm cluster, which will take place on November 19th beginning at 8am. During this time, we will be performing several important updates.
Notable Activities:

  *
Read-Only Cache: We will be reserving the cluster to transition to a read-only cache, which will result in temporary a halt to access.
  *
Brief Network Outages: There will be brief network outages as we transition to new row switches. These outages are expected to be minimal and performed while cluster is reserved.

What to Expect:

  *   During the maintenance window, access to certain features or services (including JupyterHub) may be temporarily limited or unavailable.
  *
You may experience brief connectivity disruptions due to the network outages.
  *   We recommend completing any critical tasks before the maintenance period to avoid any inconvenience.

Read-Only Cache:
In addition to the information provided in the attached announcement, swif2 was updated to include a new output feature. This requested feature better handles output files determined at runtime. Please see details here: swif2 output<https://scicomp.jlab.org/cli/output.html>.
We apologize for any disruptions this may cause and appreciate your understanding. Should you have any questions or need assistance, please feel free to contact our support team at helpdesk at jlab.org<mailto:helpdesk at jlab.org>.
Thank you for your patience.
Sincerely,
JLab Scientific Computing Team


________________________________
From: Wesley Moore
Sent: Monday, October 28, 2024 4:02 PM
To: Laura Hild via Jlab-scicomp-briefs <jlab-scicomp-briefs at jlab.org>
Subject: Nov 19: Transition of the Farm /cache filesystem to read-only from farm and ifarm

Transition of the Farm /cache filesystem to read-only from farm and ifarm

What is changing?
On November 19, 2024, during monthly maintenance, the /cache filesystem will be changed to read-only access from farm, ifarm, Globus, and XRootD Data Transfer Nodes. After that date, files will only be copied to /cache via in the following ways:

  *
The jcache command, which reads files from tape and writes them into /cache
  *
The jput command, which writes files to tape and can optionally place a copy immediately in /cache when the -cache flag is specified
  *
Data ingest from the experimental halls using the jmirror command with a regular expression pattern match for data retention in cache.


Why is this happening?

  *
The new system will ensure that files move to tape promptly and that /cache is an accurate subset of files stored on tape.
  *
In the current /cache filesystem, there are a commonly cases where files are in conflict with tape storage, leading to work slow downs.
  *
Small file handling has been a historic problem, and many small files in /cache were not stored on tape or backed up in any way


How will this affect farm job workflows?

  *
Jobs that are part of a SWIF workflow with an output specification to /cache will continue to work.
  *
Jobs that attempt to write directly to cache using open(), cp, mv, or other POSIX tools will fail.  Output from slurm jobs that are not part a SWIF workflow should be stored on /volatile and will need to be moved to tape manually using jput on ifarm.  Generally, slurm workflows that need to interact with tape would be better implemented using SWIF.
  *
Note that jput is not available on (non-interactive) farm nodes because it may queue, stalling the farm node and potentially timing out the job.


What is not changing?

  *
Cache deletion policy: remains unchanged.
  *
Cache file pinning: continues to work as before.
  *
jcache client: continues to work as before.
  *   SWIF outputs to /cache: continues to work as before.


References:

  *
KBA: Migration to read-only cache<https://jlab.servicenowservices.com/scicomp?id=kb_article_view&sysparm_article=KB0015468>
  *
KBA: Computing Coordinators<https://jlab.servicenowservices.com/scicomp?id=kb_article_view&sysparm_article=KB0014686>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/attachments/20241112/74788768/attachment.html>


More information about the Jlab-scicomp-briefs mailing list