[Ace] [Controls_dept] Advise on rebooting all console servers

Brad Cumbia cumbia at jlab.org
Fri Jul 1 18:52:28 EDT 2022


I wonder if increasing the baud rate of the console connection would help. Just a thought. I’m not sure what the current iocs support but I would be suprised if they didn’t support anything more than 9600. 9600 was what the older 167/177 supported.

______________
Brad Cumbia

Accelerator Network Engineer
Accelerator SysAdmin Group

Thomas Jefferson National Accelerator Facility
12000 Jefferson Avenue
Newport News, Virginia 23606
Phone (757)269-5839

________________________________
From: Ace <ace-bounces at jlab.org> on behalf of Brian Bevins <bevins at jlab.org>
Sent: Friday, July 1, 2022 4:37:56 PM
To: Anthony Cuffe <cuffe at jlab.org>; Gary Croke <gcroke at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>; ace at jlab.org <ace at jlab.org>
Subject: Re: [Ace] [Controls_dept] Advise on rebooting all console servers

This should definitely help reduce the number of hung consoles and I'm all for it. But there's something else that happens between the ioc and the console server that locks up the connection and is resolvable only by an ioc reboot.

For example yesterday 6 of the 14 iocs in injsbcon4 did not have working consoles. Anthony rebooted the console server and that number dropped to 4. Now it is back up to 5. I would love to find out what causes this, but it's very difficult to investigate once the console is locked up.

The tool I run to reinitialize CA security attempts to use the console of every operational ioc and logs the ones with problems while ignoring iocs it can't ping. Earlier today that number was 37 locked up out of 613 total iocs. Now it's 46. If we really want to cut down on this it might be useful to run it every maintenance day to generate a report and reboot the console servers and/or iocs that aren't working (excepting cryo, etc.)

--Brian
________________________________
From: Ace <ace-bounces at jlab.org> on behalf of Anthony Cuffe <cuffe at jlab.org>
Sent: Friday, July 1, 2022 3:39 PM
To: Gary Croke <gcroke at jlab.org>; controls_dept at jlab.org <controls_dept at jlab.org>; ace at jlab.org <ace at jlab.org>
Subject: Re: [Ace] [Controls_dept] Advise on rebooting all console servers

We do monitor them but there is really not way of knowing which ones have hung ports other than the connection attempts.  Some of the problems are also the IOCs themselves.  Since it seems to take many months and connections for the problems to arise, I think a reboot once (or twice) a month will mostly resolve the issues.  I can also provide a mechanism for developers to reboot the console servers, although this is a bit involved and I have not figured out the best way to allow this globally for each of you.


=========================
  Anthony Cuffe
  voice  : 757 269-6213
  e-mail : cuffe at jlab.org
=========================

On 7/1/22 3:32 PM, Gary Croke wrote:
> We usually discover several console problems in the lead up to, or in the early stages of a run.  We either realize we can't connect to a console, or look for information in the logs that isn't there because the console stopped logging at some point.  I think anything that can be done to reduce instances of these problems is definitely worthwhile.  If it's not easy to monitor the console servers, the proactively rebooting periodically sounds like a good idea to me.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Controls_dept <controls_dept-bounces at jlab.org> on behalf of Anthony Cuffe <cuffe at jlab.org>
> *Sent:* Friday, July 1, 2022 2:29 PM
> *To:* controls_dept at jlab.org <controls_dept at jlab.org>; ace at jlab.org <ace at jlab.org>
> *Subject:* [Controls_dept] Advise on rebooting all console servers
> I would like to setup a task to reboot all console servers on a regular schedule.  This would be to address things like port hang ups.  I propose doing this once a month on a Weds at around 5am.  This way, the chances of anyone using one would be minimal and if a console server does not reboot properly, one of us can address it when we arrive in the morning.
>
> Any suggestions or comments?
>
>
> --
> =========================
>    Anthony Cuffe
>    voice  : 757 269-6213
>    e-mail : cuffe at jlab.org
> =========================
> _______________________________________________
> Controls_dept mailing list
> Controls_dept at jlab.org
> https://mailman.jlab.org/mailman/listinfo/controls_dept <https://mailman.jlab.org/mailman/listinfo/controls_dept>

_______________________________________________
Ace mailing list
Ace at jlab.org
https://mailman.jlab.org/mailman/listinfo/ace
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20220701/451a16be/attachment-0001.html>


More information about the Ace mailing list