[Lerftest-ctrls] RF CPU reboot & iocConsole problems
Sonya Hoobler
sonya at slac.stanford.edu
Fri Sep 21 13:41:59 EDT 2018
Hi Wesley,
(Re-sending. Rejected by mailing list due to size.)
Well, the previous times this happened seemed to be caused by (according to
my emails):
-multiple dchp servers on the network
-tftp issue due to a puppet error
Do you have any diagnostics on the tftp server side? Can you see if it's
receiving a request?
Boot failure messages:
CLIENT MAC ADDR: 74 FE 48 28 BC 12 GUID: E0C325E5 39CB E511 A2F4
0108A6DAC621
CLIENT IP: 129.57.244.52 MASK: 255.255.255.0 DHCP IP: 129.57.34.93
GATEWAY IP: 129.57.244.1
PXE-E32: TFTP open timeout
PXE-M0F: Exiting Intel Boot Agent.
Thanks,
Sonya
>
>
> On Fri, 21 Sep 2018, Wesley Moore wrote:
>
>> We had a minor ssh issue. Just troublesome to sort, since I couldn't ssh
>> in to look around.
>>
>>
>> Sounds like something is still quite wrong with lcls-llrfcpu02. What do we
>> need to do to help troubleshoot it?
>>
>>
>> Wesley
>>
>> ________________________________
>> From: Sonya Hoobler <sonya at slac.stanford.edu>
>> Sent: Friday, September 21, 2018 11:54:31 AM
>> To: Wesley Moore
>> Cc: lerftest-ctrls at jlab.org; Curt Hovater; se_c at cox.net
>> Subject: Re: [Lerftest-ctrls] RF CPU reboot & iocConsole problems
>>
>> Hi again,
>>
>> I see that lcls-llrfcpu02 still does not boot--tftp timeout.
>>
>> I'll go back to hands-off.
>>
>> Sonya
>>
>>
>>
>> On Fri, 21 Sep 2018, Sonya Hoobler wrote:
>>
>>> Hi Wesley,
>>>
>>> I just logged in to look around and things seem improved.
>>>
>>> Was something done to address the problems?
>>>
>>> Thanks,
>>> Sonya
>>>
>>>
>>> On Thu, 20 Sep 2018, Sonya Hoobler wrote:
>>>
>>>> Hi Wesley,
>>>>
>>>> Thank you for the update and for following up.
>>>>
>>>> I won't do anything until hearing back from you.
>>>>
>>>> Sonya
>>>>
>>>>
>>>>
>>>> On Thu, 20 Sep 2018, Wesley Moore wrote:
>>>>
>>>>> Sonya,
>>>>>
>>>>> Larry rebooted lcls-llrfcpu02. Said it powered back up, but isn't
>>>>> showing
>>>>> any connectivity. Looks the same from my end.
>>>>>
>>>>> lclsfs - can't ssh, but pingable
>>>>> lclsapp1 - seems fine
>>>>> lclsapp2 - can't ssh, but pingable
>>>>> lcls-llrfcpu01 - seems fine
>>>>> lcls-llrfcpu02 - can't ssh, can't ping
>>>>> Control room hosts (lclsl01-03): can't ssh, but pingable
>>>>>
>>>>> Let me follow up with the guy that setup the fileserver and see if we
>>>>> can
>>>>> get that checked out first. We may need to reboot stuff after that's
>>>>> sorted out.
>>>>>
>>>>> Wesley
>>>>>
>>>>> On 9/20/18, 9:39 AM, "Lerftest-ctrls on behalf of Wesley Moore"
>>>>> <lerftest-ctrls-bounces at jlab.org on behalf of wmoore at jlab.org> wrote:
>>>>>
>>>>> Looks like at least lcls-llrfcpu02 needs to be rebooted. Others seem
>>>>> likely as well. The control room hosts aren't connecting either. Have
>>>>> you heard anything from Larry?
>>>>>
>>>>> Wesley
>>>>>
>>>>> On 9/19/18, 7:20 PM, "Lerftest-ctrls on behalf of Sonya Hoobler"
>>>>> <lerftest-ctrls-bounces at jlab.org on behalf of sonya at slac.stanford.edu>
>>>>> wrote:
>>>>>
>>>>> Hi Wesley, all,
>>>>>
>>>>> I just tried a reboot of RF CPU lcls-llrfcpu02 and it never
>>>>> successfully
>>>>> re-booted up.
>>>>>
>>>>> I can't view the boot-up sequence because iocConsole is also no
>>>>> longer
>>>>> working for either CPU:
>>>>>
>>>>> [softegr at lclsapp1 iocCommon]$ iocConsole lcls-llrfcpu01
>>>>> : ssh -x -t -l laci lclsapp2.acc.jlab.org bash -l -c "
>>>>> pyiocscreen.py -t HIOC lcls-llrfcpu01 lclsts1 2001 "
>>>>> Read from socket failed: Connection reset by peer
>>>>> [softegr at lclsapp1 iocCommon]$ iocConsole lcls-llrfcpu02
>>>>> : ssh -x -t -l laci lclsapp2.acc.jlab.org bash -l -c "
>>>>> pyiocscreen.py -t HIOC lcls-llrfcpu02 lclsts1 2002 "
>>>>> Read from socket failed: Connection reset by peer
>>>>>
>>>>> I tried a remote reboot of the terminal server.
>>>>>
>>>>> I also tried ipmitool (and EPICS ipmi) to remotely restart the
>>>>> CPU.
>>>>>
>>>>> But still no signs of life.
>>>>>
>>>>> Perhaps you could take a look at the network and/or locally? We
>>>>> may
>>>>> need a
>>>>> local power-cycle of the CPU and the terminal server. I'm cc'ing
>>>>> Larry
>>>>> Farrish, who may also be able to help with that.
>>>>>
>>>>> This is not super urgent. When you have a chance during your
>>>>> normal
>>>>> working hours, I'd appreciate any help.
>>>>>
>>>>> Thanks,
>>>>> Sonya
>>>>>
>>>>> _______________________________________________
>>>>> Mailing List: Lerftest-ctrls at jlab.org
>>>>> https://mailman.jlab.org/mailman/listinfo/lerftest-ctrls
>>>>> Wiki: https://wiki.jlab.org/lerf/index.php/Network
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing List: Lerftest-ctrls at jlab.org
>>>>> https://mailman.jlab.org/mailman/listinfo/lerftest-ctrls
>>>>> Wiki: https://wiki.jlab.org/lerf/index.php/Network
>>>>>
>>>>>
>>>
>
More information about the Lerftest-ctrls
mailing list