[Ace] [Controls_dept] Advise on rebooting all console servers
Brian Bevins
bevins at jlab.org
Fri Jul 1 16:37:56 EDT 2022
This should definitely help reduce the number of hung consoles and I'm all for it. But there's something else that happens between the ioc and the console server that locks up the connection and is resolvable only by an ioc reboot.
For example yesterday 6 of the 14 iocs in injsbcon4 did not have working consoles. Anthony rebooted the console server and that number dropped to 4. Now it is back up to 5. I would love to find out what causes this, but it's very difficult to investigate once the console is locked up.
The tool I run to reinitialize CA security attempts to use the console of every operational ioc and logs the ones with problems while ignoring iocs it can't ping. Earlier today that number was 37 locked up out of 613 total iocs. Now it's 46. If we really want to cut down on this it might be useful to run it every maintenance day to generate a report and reboot the console servers and/or iocs that aren't working (excepting cryo, etc.)
--Brian
________________________________
From: Ace <ace-bounces at jlab.org> on behalf of Anthony Cuffe <cuffe at jlab.org>
Sent: Friday, July 1, 2022 3:39 PM
To: Gary Croke <gcroke at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>; ace at jlab.org <ace at jlab.org>
Subject: Re: [Ace] [Controls_dept] Advise on rebooting all console servers
We do monitor them but there is really not way of knowing which ones have hung ports other than the connection attempts. Some of the problems are also the IOCs themselves. Since it seems to take many months and connections for the problems to arise, I think a reboot once (or twice) a month will mostly resolve the issues. I can also provide a mechanism for developers to reboot the console servers, although this is a bit involved and I have not figured out the best way to allow this globally for each of you.
=========================
Anthony Cuffe
voice : 757 269-6213
e-mail : cuffe at jlab.org
=========================
On 7/1/22 3:32 PM, Gary Croke wrote:
> We usually discover several console problems in the lead up to, or in the early stages of a run. We either realize we can't connect to a console, or look for information in the logs that isn't there because the console stopped logging at some point. I think anything that can be done to reduce instances of these problems is definitely worthwhile. If it's not easy to monitor the console servers, the proactively rebooting periodically sounds like a good idea to me.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Controls_dept <controls_dept-bounces at jlab.org> on behalf of Anthony Cuffe <cuffe at jlab.org>
> *Sent:* Friday, July 1, 2022 2:29 PM
> *To:* controls_dept at jlab.org <controls_dept at jlab.org>; ace at jlab.org <ace at jlab.org>
> *Subject:* [Controls_dept] Advise on rebooting all console servers
> I would like to setup a task to reboot all console servers on a regular schedule. This would be to address things like port hang ups. I propose doing this once a month on a Weds at around 5am. This way, the chances of anyone using one would be minimal and if a console server does not reboot properly, one of us can address it when we arrive in the morning.
>
> Any suggestions or comments?
>
>
> --
> =========================
> Anthony Cuffe
> voice : 757 269-6213
> e-mail : cuffe at jlab.org
> =========================
> _______________________________________________
> Controls_dept mailing list
> Controls_dept at jlab.org
> https://mailman.jlab.org/mailman/listinfo/controls_dept <https://mailman.jlab.org/mailman/listinfo/controls_dept>
_______________________________________________
Ace mailing list
Ace at jlab.org
https://mailman.jlab.org/mailman/listinfo/ace
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20220701/a392d7f5/attachment.html>
More information about the Ace
mailing list