[Jlab-scicomp-briefs] Farm Lustre outage for /cache and /volatile - Sun Feb 6, 2022
Bryan Hess
bhess at jlab.org
Sun Feb 6 06:31:07 EST 2022
The /cache and /volatile Lustre metadata server for the farm rebooted unexpectedly Saturday night. The filesystem recovery process, normally quick, began last night and continues with an estimated completion time later this morning. We are monitoring the progress and will have the system in service as soon as possible.
Until the filesystem recovery completes file I/O to /cache and /volatile will hang. Thank you for your patience.
Lustre usage tips for /cache and /volatile:
* Avoid small or unbuffered writes to Lustre when possible.
* Write log files to /farm_out when possible, which is tuned for high IOPS/small files.
* Avoid file metadata operations in tight loops, such as hundreds of open() and close() operations per second.
* Lustre perform best when reading or writing files using large buffers.
* Per-User Farm Job IOPS are shown here https://scicomp.jlab.org/scicomp/slurmJob/jobLimit
* FarmJobs with an I/O mix that is challenging for Lustre are shown https://scicomp.jlab.org/scicomp/slurmJob/jobLimit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/attachments/20220206/c50e332c/attachment.html>
More information about the Jlab-scicomp-briefs
mailing list