[Clas_offline] Fwd: Re: [Jlab-scicomp-briefs] JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)

Harut Avakian avakian at jlab.org
Wed Feb 24 11:11:17 EST 2016


Dear Offline,
Below is the latest update on the Intel Lustre software upgrade status.
Computing is working hard to restore the services ASAP.
Please be patient.
Harut



-------- Forwarded Message --------
Subject: Re: [Jlab-scicomp-briefs] JLab HPC, farm, storage outage Tue 
Feb 23 (and Feb 24 if needed)
Date: Wed, 24 Feb 2016 11:03:32 -0500 (EST)
From: Sandy Philpott <philpott at jlab.org>
To: jlab-scicomp-briefs at jlab.org

Hello Users,

An update on our Intel Lustre software upgrade...

Intel is still working in our testbed to get Intel's Lustre software 
installed into our environment. Once the issues are resolved with the 
CentOS 6.5 to 6.7 update and Lustre upgrade in our testbed, we can then 
begin the upgrade to our 26 Lustre production servers.

During the quiescence of production Lustre services, we are performing a 
full backup of the Lustre metadata and extended attributes; that process 
is at 24 hours and still running.

We will know more later today on the timeline for restoring services, as 
the issues with the Intel Lustre install are resolved and we are able to 
upgrade our production systems. At this point, it looks probable that 
the upgrade will spill over into tomorrow.

Regards,
Sandy

----- Original Message -----
From: "Sandy Philpott" <philpott at jlab.org>
To: jlab-scicomp-briefs at jlab.org
Sent: Wednesday, February 17, 2016 9:40:51 AM
Subject: JLab HPC, farm, storage outage Tue Feb 23 (and Feb 24 if needed)

Dear JLab HPC, Farm, and MSS Users,

We are planning updates to our Lustre 2.5.3 filesystem on Tuesday Feb 
23, that will impact the HPC and batch farm clusters and their storage; 
the Lustre-based /mss, /cache, /work, and /volatile filesystems will be 
unavailable. File services may be restored at the end of the day 
Tuesday, or if more time is needed we may continue through Wednesday 
close-of-business. This Lustre update is part of our continuing effort 
to improve the stability of the Lustre and ZFS fileservers that have 
been intermittently problematic since our upgrade from Lustre 1.8 
started last May.

Upon restoring Lustre filesystem services, our plans are to

- add the 2 newest 2015 fileservers into production, an additional 0.5 
PB into our existing 1.2 PB /lustre filesystem, then later decommission 
the oldest 2011 servers
- implement a Lustre "work" pool for the farm, to better handle small 
random files, with each 12-disk set configured as 3 mirrors of 4 striped 
disks

Please plan accordingly for this scheduled one or two day upgrade of the 
Lustre filesystem next week.

Regards,
Sandy

    Sandra C. Philpott, Operations Manager
_______________________________________________
Jlab-scicomp-briefs mailing list
Jlab-scicomp-briefs at jlab.org
https://mailman.jlab.org/mailman/listinfo/jlab-scicomp-briefs




More information about the Clas_offline mailing list