[Ace] OPS and DEV Outage Tomorrow at 11am -- Please log off after work tonight

Anthony Cuffe cuffe at jlab.org
Fri Oct 23 13:35:40 EDT 2020


Accelerator Users,

There will be an outage all Linux systems in the OPS and DEV accelerator environments tomorrow from 11am followed by system reboots.  The initial outage will only be around 15 minutes and critical systems will be back just after this.  The cleanup of other systems (like workstations) might take up to an hour.  The impact to other enclaves like CHL, ITF and SRF should be minimal and has been planned with the associated parties.  Please save your work and log off any OPS or DEV Linux systems at the end of your workday today.  A logbook notification will be sent at the start and end of this work.  Below is the ATLis task, a brief reason why this work must be completed and a list of impacted systems.

Thank you for your time and sorry for the inconvenience,

Anthony Cuffe

ATLis Task:

http://devweb.acc.jlab.org/CSUEApps/atlis/task/21441


Reason for Rollback of File Servers:

On RHEL 7 the default file system (XFS) defaults to creating inode numbers using 64 bits for new files on file systems larger than 2TBs. 32-bit software not built with Large File Support (LFS) get confused when stat()ing such files, usually ending in crashes/missing UI pieces, etc.  Additionally, legacy IOC systems have trouble even reading these file systems.  It took some time to understand this problem mainly because it only affects our larger file systems (like opsdata) and most of our systems are now 64-bit RHEL 7.

The previous opsfs and devfs were kept in production for just such a problem .  All files will be sync'd back before roll back so  nothing created or changed in the interim will be affected.  The new file servers will be deployed later with the same file system we currently use (EXT4) and care will be taken to ensure compatibility with legacy systems.

Affected System List:

cagw
cagwhla
cagwhlb
cagwhlc
cagwhld
cagwlcls
cagwops
cagwrad
cagwsite
cagwts
crl
hlal00
hlal01
hlbl00
opsbat0
opsbat1
opsbat2
opsbat3
opsbat4
opscam1
opsfs
opsfs0
opsl00
opsl01
opsl02
opsl03
opsl04
opsl05
opsl06
opsl07
opsl08
opsl09
opsl10
opsl12
opsl13
opsl15
opsl632
opsla0
opsla2
opsld1
opslfb1
opslfb2
opslin1
opslin2
opsltab
opsmdaq0
opsns
opsnx00
opsnx01
opsnx02
opsnx03
opsnx1
opsnx2
opsweb
subopsl01
subopsl02
subopsl03
subopsl04
subopsl05
subopsl06
subopsl07
subopsl08
subopsl09
subopsl10
subopsl11
subopsl12
subopsl13
subopsl14
subopsl15
subopsl16
subopsl77
svclb92
svclbsy1
svclbsyds
svclin1
svclin2
svclna1
svclnl1
svclnl2
svclnl3
svclnl4
svclnl5
svclsl1
svclsl2
svclsl3
svclsl4
svclsl5
svcltsb1
svclw1
svclw2
svclw3
cryol01
dcpl01
dcpl02
dcpl03
dcpl04
dcpl05
devbat0
devbat1
devbat2
devcam1
devfs0
devfs
devl00
devl01
devl02
devl04
devl05
devl06
devl07
devl08
devl10
devl101
devl11
devl12
devl14
devl15
devl16
devl17
devl18
devl19
devl20
devl21
devl22
devl23
devl26
devl27
devl65
devl66
devl68
devl72
devl76
devl77
devl79
devllrf1
devlmmf1
devlsim01
devlsim02
devlsim03
devlsim1
devlsim2
devlsim3
devlsim4
devlsim5
devns
devnx00
devtest
devweb
digil01
digil02
digil03
digil04
digil05
dvlweb
eesdcpl1
eesl01
eesl04
eesl05
eeslcamacstand
eesllrf1
eesllrf2
gpbl01
ssglmpsts

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20201023/0b2a8b6a/attachment.html>


More information about the Ace mailing list