[Lerftest-ctrls] RF CPU reboot & iocConsole problems

Sonya Hoobler sonya at slac.stanford.edu
Fri Sep 21 13:41:59 EDT 2018


Hi Wesley,

(Re-sending. Rejected by mailing list due to size.)

Well, the previous times this happened seemed to be caused by (according to 
my emails):

-multiple dchp servers on the network

-tftp issue due to a puppet error

Do you have any diagnostics on the tftp server side? Can you see if it's 
receiving a request?

Boot failure messages:

CLIENT MAC ADDR: 74 FE 48 28 BC 12  GUID: E0C325E5 39CB E511 A2F4 
0108A6DAC621
CLIENT IP: 129.57.244.52  MASK: 255.255.255.0  DHCP IP: 129.57.34.93
GATEWAY IP: 129.57.244.1
PXE-E32: TFTP open timeout
PXE-M0F: Exiting Intel Boot Agent.

Thanks,
    Sonya

>
>
> On Fri, 21 Sep 2018, Wesley Moore wrote:
>
>> We had a minor ssh issue.  Just troublesome to sort, since I couldn't ssh 
>> in to look around.
>> 
>> 
>> Sounds like something is still quite wrong with lcls-llrfcpu02.  What do we 
>> need to do to help troubleshoot it?
>> 
>> 
>> Wesley
>> 
>> ________________________________
>> From: Sonya Hoobler <sonya at slac.stanford.edu>
>> Sent: Friday, September 21, 2018 11:54:31 AM
>> To: Wesley Moore
>> Cc: lerftest-ctrls at jlab.org; Curt Hovater; se_c at cox.net
>> Subject: Re: [Lerftest-ctrls] RF CPU reboot & iocConsole problems
>> 
>> Hi again,
>> 
>> I see that lcls-llrfcpu02 still does not boot--tftp timeout.
>> 
>> I'll go back to hands-off.
>> 
>> Sonya
>> 
>> 
>> 
>> On Fri, 21 Sep 2018, Sonya Hoobler wrote:
>> 
>>> Hi Wesley,
>>> 
>>> I just logged in to look around and things seem improved.
>>> 
>>> Was something done to address the problems?
>>> 
>>> Thanks,
>>>  Sonya
>>> 
>>> 
>>> On Thu, 20 Sep 2018, Sonya Hoobler wrote:
>>> 
>>>> Hi Wesley,
>>>> 
>>>> Thank you for the update and for following up.
>>>> 
>>>> I won't do anything until hearing back from you.
>>>> 
>>>> Sonya
>>>> 
>>>> 
>>>> 
>>>> On Thu, 20 Sep 2018, Wesley Moore wrote:
>>>> 
>>>>> Sonya,
>>>>> 
>>>>> Larry rebooted lcls-llrfcpu02.  Said it powered back up, but isn't 
>>>>> showing
>>>>> any connectivity.  Looks the same from my end.
>>>>> 
>>>>> lclsfs - can't ssh, but pingable
>>>>> lclsapp1 - seems fine
>>>>> lclsapp2 - can't ssh, but pingable
>>>>> lcls-llrfcpu01 - seems fine
>>>>> lcls-llrfcpu02 - can't ssh, can't ping
>>>>> Control room hosts (lclsl01-03): can't ssh, but pingable
>>>>> 
>>>>> Let me follow up with the guy that setup the fileserver and see if we 
>>>>> can
>>>>> get that checked out first.  We may need to reboot stuff after that's
>>>>> sorted out.
>>>>> 
>>>>> Wesley
>>>>> 
>>>>> On 9/20/18, 9:39 AM, "Lerftest-ctrls on behalf of Wesley Moore"
>>>>> <lerftest-ctrls-bounces at jlab.org on behalf of wmoore at jlab.org> wrote:
>>>>>
>>>>>    Looks like at least lcls-llrfcpu02 needs to be rebooted.  Others seem
>>>>> likely as well.  The control room hosts aren't connecting either.  Have
>>>>> you heard anything from Larry?
>>>>>
>>>>>    Wesley
>>>>>
>>>>>    On 9/19/18, 7:20 PM, "Lerftest-ctrls on behalf of Sonya Hoobler"
>>>>> <lerftest-ctrls-bounces at jlab.org on behalf of sonya at slac.stanford.edu>
>>>>> wrote:
>>>>>
>>>>>        Hi Wesley, all,
>>>>>
>>>>>        I just tried a reboot of RF CPU lcls-llrfcpu02 and it never
>>>>> successfully
>>>>>        re-booted up.
>>>>>
>>>>>        I can't view the boot-up sequence because iocConsole is also no
>>>>> longer
>>>>>        working for either CPU:
>>>>>
>>>>>        [softegr at lclsapp1 iocCommon]$ iocConsole lcls-llrfcpu01
>>>>>          : ssh -x -t -l laci lclsapp2.acc.jlab.org bash -l -c "
>>>>> pyiocscreen.py -t HIOC lcls-llrfcpu01 lclsts1 2001 "
>>>>>        Read from socket failed: Connection reset by peer
>>>>>        [softegr at lclsapp1 iocCommon]$ iocConsole lcls-llrfcpu02
>>>>>          : ssh -x -t -l laci lclsapp2.acc.jlab.org bash -l -c "
>>>>> pyiocscreen.py -t HIOC lcls-llrfcpu02 lclsts1 2002 "
>>>>>        Read from socket failed: Connection reset by peer
>>>>>
>>>>>        I tried a remote reboot of the terminal server.
>>>>>
>>>>>        I also tried ipmitool (and EPICS ipmi) to remotely restart the 
>>>>> CPU.
>>>>>
>>>>>        But still no signs of life.
>>>>>
>>>>>        Perhaps you could take a look at the network and/or locally? We 
>>>>> may
>>>>> need a
>>>>>        local power-cycle of the CPU and the terminal server. I'm cc'ing
>>>>> Larry
>>>>>        Farrish, who may also be able to help with that.
>>>>>
>>>>>        This is not super urgent. When you have a chance during your 
>>>>> normal
>>>>>        working hours, I'd appreciate any help.
>>>>>
>>>>>        Thanks,
>>>>>           Sonya
>>>>>
>>>>>        _______________________________________________
>>>>>        Mailing List: Lerftest-ctrls at jlab.org
>>>>>        https://mailman.jlab.org/mailman/listinfo/lerftest-ctrls
>>>>>        Wiki: https://wiki.jlab.org/lerf/index.php/Network
>>>>> 
>>>>> 
>>>>>
>>>>>    _______________________________________________
>>>>>    Mailing List: Lerftest-ctrls at jlab.org
>>>>>    https://mailman.jlab.org/mailman/listinfo/lerftest-ctrls
>>>>>    Wiki: https://wiki.jlab.org/lerf/index.php/Network
>>>>> 
>>>>> 
>>> 
>


More information about the Lerftest-ctrls mailing list