[Clas_offline] Fwd: Re: [Jlab-scicomp-briefs] JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)
Harut Avakian
avakian at jlab.org
Wed Feb 24 11:11:17 EST 2016
Dear Offline,
Below is the latest update on the Intel Lustre software upgrade status.
Computing is working hard to restore the services ASAP.
Please be patient.
Harut
-------- Forwarded Message --------
Subject: Re: [Jlab-scicomp-briefs] JLab HPC, farm, storage outage Tue
Feb 23 (and Feb 24 if needed)
Date: Wed, 24 Feb 2016 11:03:32 -0500 (EST)
From: Sandy Philpott <philpott at jlab.org>
To: jlab-scicomp-briefs at jlab.org
Hello Users,
An update on our Intel Lustre software upgrade...
Intel is still working in our testbed to get Intel's Lustre software
installed into our environment. Once the issues are resolved with the
CentOS 6.5 to 6.7 update and Lustre upgrade in our testbed, we can then
begin the upgrade to our 26 Lustre production servers.
During the quiescence of production Lustre services, we are performing a
full backup of the Lustre metadata and extended attributes; that process
is at 24 hours and still running.
We will know more later today on the timeline for restoring services, as
the issues with the Intel Lustre install are resolved and we are able to
upgrade our production systems. At this point, it looks probable that
the upgrade will spill over into tomorrow.
Regards,
Sandy
----- Original Message -----
From: "Sandy Philpott" <philpott at jlab.org>
To: jlab-scicomp-briefs at jlab.org
Sent: Wednesday, February 17, 2016 9:40:51 AM
Subject: JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)
Dear JLab HPC, Farm, and MSS Users,
We are planning updates to our Lustre 2.5.3 filesystem on Tuesday Feb
23, that will impact the HPC and batch farm clusters and their storage;
the Lustre-based /mss, /cache, /work, and /volatile filesystems will be
unavailable. File services may be restored at the end of the day
Tuesday, or if more time is needed we may continue through Wednesday
close-of-business. This Lustre update is part of our continuing effort
to improve the stability of the Lustre and ZFS fileservers that have
been intermittently problematic since our upgrade from Lustre 1.8
started last May.
Upon restoring Lustre filesystem services, our plans are to
- add the 2 newest 2015 fileservers into production, an additional 0.5
PB into our existing 1.2 PB /lustre filesystem, then later decommission
the oldest 2011 servers
- implement a Lustre "work" pool for the farm, to better handle small
random files, with each 12-disk set configured as 3 mirrors of 4 striped
disks
Please plan accordingly for this scheduled one or two day upgrade of the
Lustre filesystem next week.
Regards,
Sandy
Sandra C. Philpott, Operations Manager
_______________________________________________
Jlab-scicomp-briefs mailing list
Jlab-scicomp-briefs at jlab.org
https://mailman.jlab.org/mailman/listinfo/jlab-scicomp-briefs
More information about the Clas_offline
mailing list