[ace] Rough Plan for Monday Power Outage
Anthony Cuffe
cuffe at jlab.org
Fri Jun 9 14:51:07 EDT 2023
=================
Shutdown Summary
=================
Note that I will be on-site and will coordinate activities that overlap to ensure we are not stepping on each other.
7:15:
* Swap power supply in devmc02sw1 and power it from Bertha, turn off 2nd PS (Brad)
7:30:
* Move power for 1 power supply in opsmc02sw1 to Bertha, turn off 2nd PS (Brad)
* Move 1 power supply for opsfs, devfs and csmfs to Bertha (Anthony)
* Shutdown MYA Nodes (Chris)
7:45:
* Make a log entry and send out a notification email (Anthony)
8:00:
* Shutdown Non-Critical and Bare Metal Servers (Anthony)
8:20:
* Shutdown all Virtual Machines and then Hypervisors (Erik)
* Shutdown Database Systems (Theo/Anthony)
8:30
* Shutdown srffs, itffs, felfs and csml00 (Anthony)
8:40
* Force router switchover from VSS1 to VSS2. (Brad)
* Force switchover from firewall1 to firewall2 (Brad)
* Shutdown remaining network items (Brad)
8:??
* Turn off rack UPSs to ensure recovery order and avoid surges
* Notify Facilities they can proceed with power work.
========
Recovery
========
Recover Network (Brad)
* Verify VSS1 and Firewall1 is up
* Force switchover form VSS2 to VSS1, verify force switchover from firewall2 to firewall1, verify
* opsmc02sw1 - turn on 2nd PS, move power for 1st supply for back to ups power
* devmc02sw1 - turn on second PS, swap first supply back to original and connect to ups
* Verify all network switches in MCC are up (script)
Recover Systems: (Anthony, Erik and Theo)
* Move PS for opsfs, devfs and csmfs back to ups (Anthony)
* Recover srffs, felfs, itffs and database systems (Theo/Anthony)
* Recover VMware nodes and VMs (Erik)
* Recover MYA nodes (Chris/Anthony)
* Recover remaining UPSs/systems (Theo/Erik/Anthony)
Post Recovery: (Team)
* Verify database and web services. (Theo L. and Ryan)
* Reboot angry remote systems.
* Restart/Verify other services using nagios, pingnode, etc ...
* Make logbook entry and email users.
* Assist ACS with IOC recovery operations.
* Coolie at Magic Mushroom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20230609/eddb47fe/attachment-0001.html>
More information about the ace
mailing list