[Ace] Log entry notification: : IOC failure and recovery summary
elogadm at jlab.org
elogadm at jlab.org
Fri May 25 11:40:02 EDT 2012
http://opweb.acc.jlab.org/CSUEApps/elog/entry/1787277
1787277: IOC failure and recovery summary
Software on-call got called in because injector IOCs were not responding (iocin3, iocpss, etc.). SysAdmin on-call was called in shortly afterwards.
Had a lot of trouble tracking down source of problem due to lack of diagnostic tools (most notably Wireshark, to look at network packet traffic). In process of troubleshooting rebooted opsrv and opsbat3 with no effect. CDEV name server had to be restarted on opsrv.
Finally were able to get Wireshark setup on an OPS machine to look at network traffic at about 1100 and found iocse4 was screaming across network and causing a network storm. Iocse4 had been experiencing errors in its hourly save files since 5/18. Once this IOC was rebooted recovery could begin.
Wireshark has been installed on opsl00 (yum install wireshark-gnome) for future use.
Weiwei restarted CDEV name server on opsrv.
iocse6 and iocse13 are still down, but not needed for PEPPO run. Software on-call and Suhring are aware of issues with these IOCs.
To read this message with graphical attachments, up-to-date information and proper formatting,
you may type the following URL into a web browser: http://opsweb.acc.jlab.org/CSUEApps/elog/entry/1787277
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/ace/attachments/20120525/4ab5c8af/attachment.html
More information about the Ace
mailing list