[Ace] IOC Console Lockups [was Re: IOC Channel Access]

Adam Carpenter adamc at jlab.org
Fri Jun 30 14:19:55 EDT 2017


We rebooted the old console servers since they are known to occasionally lock up ports and rebooting them is an easy fix.  However, we still couldn't talk to the IOCs even though the console server seemed more responsive.  The IOC console logs seem to show those IOCs printing output as of June 28 or 29.

Spoke with Brian and tried a few simple test on the newer console server (mcccon12).  Everything seemed to indicate it is the IOC and not the console server.
 - Disabling/Enabling the port on the console server didn't do anything.
 - Unplugging/plugging the cable on the device didn't fix anything.  
 - Swapped ports on IOC console cable and a UPS cable.  UPS worked on IOC's port, IOC did not work on UPS's port
 - Cannot telnet to it (it is a VxWorks IOC)
 - Tested several other ports.  All are working.
 - Rebooted console server.  No change.

My impression is that the older console servers randomly lock up individual ports, and that the newer console servers rarely, if ever, show that same problem.  Brian is currently running a full test to see which ioc_console calls are unresponsive.

Adam Carpenter
Thomas Jefferson National Accelerator Facility
System Administrator

----- Original Message -----
From: "Brad Cumbia" <cumbia at jlab.org>
To: "Brian Bevins" <bevins at jlab.org>
Cc: "ace" <ace at jlab.org>, "Pamela Kjeldsen" <kjeldsen at jlab.org>, "Adam Carpenter" <adamc at jlab.org>
Sent: Friday, June 30, 2017 1:17:49 PM
Subject: Re: [Ace] IOC Console Lockups [was Re:  IOC Channel Access]

The console servers for iocin7 and iocnl3 are the old model of console server.  Not much diagnostics available.  I rebooted those console servers.  Adam wants to look at the console server iocmodiag since it also has the mya nodes on it and there was a disk failure in one of the mya nodes.

Brad Cumbia

Accelerator Network Engineer/Associate Coordinator
Accelerator SysAdmin Group
 
Thomas Jefferson National Accelerator Facility
12000 Jefferson Avenue 
Newport News, Virginia 23606 
Phone (757)269-5839 
Fax (757)269-7309

----- Original Message -----
From: "Brian Bevins" <bevins at jlab.org>
To: "ace" <ace at jlab.org>
Cc: "Pamela Kjeldsen" <kjeldsen at jlab.org>
Sent: Friday, June 30, 2017 12:52:36 PM
Subject: [Ace] IOC Console Lockups [was Re:  IOC Channel Access]

Currently iocin7, iocmodiag, and iocnl3 are all the state where the 
console is non-functional but they are pingable and reachable by channel 
access. Any diagnostics we can try?


On 05/22/17 16:05, Brian Bevins wrote:
> I think it's very likely that this is frequently an ioc issue, and I 
> don't advocate rushing anything out to address it that might weaken 
> security. I had hoped that there was a port-by-port control that would 
> allow resetting the serial bus, in the hope that it would be easy to 
> expose and helpful.
>
> I am fine with monitoring this for now and trying to characterize how 
> often it happens. It's not unlikely that I see the same hung consoles 
> over and over when I'm routinely connecting to them in large numbers.
>
> It might be possible ti implement something in the system database 
> that would allow triggering a serial reset from the ioc end via 
> channel access, if it is often the ioc that is hung. Once the ioc and 
> the console server aren't talking, it's hard to diagnose what's going 
> on without starting to just reboot things.
>
> --Brian
>
> On 05/22/17 13:22, Anthony Cuffe wrote:
>> Brian,
>>
>> I don't think that this is always the console server itself. Haven't 
>> some of the IOCs exhibited behaviour where their console is hung by 
>> some previous process like the iocaudit utility and the only way to 
>> resolve it is to reboot the IOC?
>>
>> We are not against creating a mechanism to do reboot the console 
>> servers.  It is just not a trivial change.  We will have to enable a 
>> login shell for the vxuser account.  I want to look over this 
>> configuration before pushing it out everywhere.
>>
>>
>> +-------------------------+
>> | Anthony Cuffe           |
>> | voice  : 757 269-6213   |
>> | e-mail : cuffe at jlab.org |
>> +-------------------------+
>>
>> On 05/22/2017 12:20 PM, Brian Bevins wrote:
>>> A hung console server is actually a pretty frequent event. Usually 
>>> we work around it, by power cycling iocs vs. soft reboots, using 
>>> telnet, etc.
>>>
>>> When I run tools that go out and touch all the iocs, there are 
>>> always hung consoles.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ace mailing list
>>> Ace at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/ace
>>>
>
>


-- 
Brian S. Bevins, PE
Computer Scientist / Mechanical Engineer
Thomas Jefferson National Accelerator Facility

_______________________________________________
Ace mailing list
Ace at jlab.org
https://mailman.jlab.org/mailman/listinfo/ace


More information about the Ace mailing list