[Jlab-scicomp-briefs] JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)

Sandy Philpott philpott at jlab.org
Wed Feb 24 11:03:32 EST 2016


Hello Users,

An update on our Intel Lustre software upgrade...

Intel is still working in our testbed to get Intel's Lustre software installed into our environment. Once the issues are resolved with the CentOS 6.5 to 6.7 update and Lustre upgrade in our testbed, we can then begin the upgrade to our 26 Lustre production servers.  

During the quiescence of production Lustre services, we are performing a full backup of the Lustre metadata and extended attributes; that process is at 24 hours and still running.

We will know more later today on the timeline for restoring services, as the issues with the Intel Lustre install are resolved and we are able to upgrade our production systems. At this point, it looks probable that the upgrade will spill over into tomorrow. 

Regards,
Sandy

----- Original Message -----
From: "Sandy Philpott" <philpott at jlab.org>
To: jlab-scicomp-briefs at jlab.org
Sent: Wednesday, February 17, 2016 9:40:51 AM
Subject: JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)

Dear JLab HPC, Farm, and MSS Users,

We are planning updates to our Lustre 2.5.3 filesystem on Tuesday Feb 23, that will impact the HPC and batch farm clusters and their storage; the Lustre-based /mss, /cache, /work, and /volatile filesystems will be unavailable. File services may be restored at the end of the day Tuesday, or if more time is needed we may continue through Wednesday close-of-business. This Lustre update is part of our continuing effort to improve the stability of the Lustre and ZFS fileservers that have been intermittently problematic since our upgrade from Lustre 1.8 started last May.

Upon restoring Lustre filesystem services, our plans are to

- add the 2 newest 2015 fileservers into production, an additional 0.5 PB into our existing 1.2 PB /lustre filesystem, then later decommission the oldest 2011 servers
- implement a Lustre "work" pool for the farm, to better handle small random files, with each 12-disk set configured as 3 mirrors of 4 striped disks 

Please plan accordingly for this scheduled one or two day upgrade of the Lustre filesystem next week. 

Regards,
Sandy

   Sandra C. Philpott, Operations Manager



More information about the Jlab-scicomp-briefs mailing list