[Ace] Network timeouts being reported from various applications -- feels ominous.
Scott Higgins
higgins at jlab.org
Tue Sep 7 11:23:54 EDT 2021
Had an issue on Friday Sept. 3 where I rebooted iocsoftnl05rf for a change and myRestore restored from Aug 19th instead
of Sept 3 right before the reboot. This prevented OPS from restoring beam.
Worked through the issue with Theo and found that the archiver stopped archiving some EPICS PV's that were part
of the active Restore Set for iocsoftnl05rf. These were random signals from the restore set with no noticeable pattern. The archiver
has an algorithm were if it thinks a PV has gone away after some period of time it stops archiving it. However these PV's
never went away. This was the first time I have seen this happen.
________________________________
From: Controls_dept <controls_dept-bounces at jlab.org> on behalf of Gary Croke <gcroke at jlab.org>
Sent: Tuesday, September 7, 2021 11:13 AM
To: Adam Carpenter <adamc at jlab.org>; Michele Joyce <erb at jlab.org>; Omar Garza <garza at jlab.org>; Theo Larrieu <theo at jlab.org>; ace at jlab.org <ace at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>
Subject: Re: [Controls_dept] Network timeouts being reported from various applications -- feels ominous.
Here's one from Friday that probably falls in this category. Failed on some caputs, although channels were available:
https://logbooks.jlab.org/entry/3900716
Another attempt shortly after succeeded:
https://logbooks.jlab.org/entry/3900725
________________________________
From: Adam Carpenter <adamc at jlab.org>
Sent: Tuesday, September 7, 2021 11:06 AM
To: Michele Joyce <erb at jlab.org>; Gary Croke <gcroke at jlab.org>; Omar Garza <garza at jlab.org>; Theo Larrieu <theo at jlab.org>; ace at jlab.org <ace at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>
Subject: Re: Network timeouts being reported from various applications -- feels ominous.
Omar,
Do you remember which IOCs these were? PC104s or something else? Was it part of a mass reboot or just a single IOC?
Thanks,
Adam
Adam Carpenter
Accelerator Operations Software Department
Thomas Jefferson National Accelerator Facility
________________________________
From: Ace <ace-bounces at jlab.org> on behalf of Michele Joyce <erb at jlab.org>
Sent: Tuesday, September 7, 2021 10:25 AM
To: Gary Croke <gcroke at jlab.org>; Omar Garza <garza at jlab.org>; Theo Larrieu <theo at jlab.org>; ace at jlab.org <ace at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>
Subject: Re: [Ace] Network timeouts being reported from various applications -- feels ominous.
Is anything machine dependent?
From: Controls_dept <controls_dept-bounces at jlab.org> on behalf of Gary Croke <gcroke at jlab.org>
Date: Sunday, September 5, 2021 at 11:09 AM
To: Omar Garza <garza at jlab.org>, Theo Larrieu <theo at jlab.org>, ace at jlab.org <ace at jlab.org>, controls_dept at jlab.org <controls_dept at jlab.org>
Subject: Re: [Controls_dept] Network timeouts being reported from various applications -- feels ominous.
What's the suspicion? Has there been a big uptick in network traffic recently, or have we gradually been approaching some limit? Or just some failing hardware somewhere?
________________________________
From: Controls_dept <controls_dept-bounces at jlab.org> on behalf of Omar Garza <garza at jlab.org>
Sent: Saturday, September 4, 2021 7:58 PM
To: Theo Larrieu <theo at jlab.org>; ace at jlab.org <ace at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>
Subject: Re: [Controls_dept] Network timeouts being reported from various applications -- feels ominous.
This nay also be related
In the past weeks, we have also performed hard reboots on iocs with multiple tries failing to complete network connections .
2-3 hard reboots to succeed.
Omar
Sent from my Verizon, Samsung Galaxy smartphone
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Ace <ace-bounces at jlab.org> on behalf of Theo Larrieu <theo at jlab.org>
Sent: Friday, September 3, 2021 5:27:32 PM
To: ace at jlab.org <ace at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>
Subject: [Ace] Network timeouts being reported from various applications -- feels ominous.
Observing that in the past 24 hours there have been several seemingly unrelated applications having trouble making connections to EPICS signals.
Cavity History
https://logbooks.jlab.org/entry/3900743
FOPT
https://logbooks.jlab.org/entry/3900317
myRestore
https://logbooks.jlab.org/entry/3900705
It might be wise to start looking for signs of distress in name server, softIOC hosts, network switches, etc.
-Theo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20210907/4ef7be87/attachment-0001.html>
More information about the Ace
mailing list