[Dsg-halla_ecal] WEDM values not working
Marc Mcmullen
mcmullen at jlab.org
Wed Apr 24 15:02:55 EDT 2024
Thanks Brian,
Bill Henry is looking into the IOC on this end. Hopefully that will fix the issue.
Marc
________________________________
From: Brian Bevins <bevins at jlab.org>
Sent: Wednesday, April 24, 2024 2:48 PM
To: Marc Mcmullen <mcmullen at jlab.org>; Nathan Baltzell <baltzell at jlab.org>; Ryan Slominski <ryans at jlab.org>
Cc: Tyler Lemon <tlemon at jlab.org>; Pamela Kjeldsen <kjeldsen at jlab.org>; Weiwei Lu <weiwei at jlab.org>; Christopher Slominski <cjs at jlab.org>; Anthony Cuffe <cuffe at jlab.org>; Hovanes Egiyan <hovanes at jlab.org>; William Henry <wmhenry at jlab.org>; Ellen Becker <ebecker at jlab.org>; dsg-halla_ecal at jlab.org <dsg-halla_ecal at jlab.org>
Subject: Re: WEDM values not working
Hi Marc,
The CA gateway cagwops seems to be working normally. I restarted it for good measure.
--Brian
________________________________
From: Marc Mcmullen <mcmullen at jlab.org>
Sent: Tuesday, April 23, 2024 6:16 PM
To: Brian Bevins <bevins at jlab.org>; Nathan Baltzell <baltzell at jlab.org>; Ryan Slominski <ryans at jlab.org>
Cc: Tyler Lemon <tlemon at jlab.org>; Pamela Kjeldsen <kjeldsen at jlab.org>; Weiwei Lu <weiwei at jlab.org>; Christopher Slominski <cjs at jlab.org>; Anthony Cuffe <cuffe at jlab.org>; Hovanes Egiyan <hovanes at jlab.org>; William Henry <wmhenry at jlab.org>; Ellen Becker <ebecker at jlab.org>; dsg-halla_ecal at jlab.org <dsg-halla_ecal at jlab.org>
Subject: Re: WEDM values not working
Hi Brian,
Last evening hall a had an IOC crash. They say it should have rebooted and come back online. But since then we have some systems that we are having channel access issues.
I dug up this email from the last time something like this happened can you check into it?
Regards,
Marc
________________________________
From: Brian Bevins <bevins at jlab.org>
Sent: Wednesday, April 7, 2021 4:10 PM
To: Nathan Baltzell <baltzell at jlab.org>; Ryan Slominski <ryans at jlab.org>
Cc: Tyler Lemon <tlemon at jlab.org>; Marc Mcmullen <mcmullen at jlab.org>; Pamela Kjeldsen <kjeldsen at jlab.org>; Weiwei Lu <weiwei at jlab.org>; Christopher Slominski <cjs at jlab.org>; Anthony Cuffe <cuffe at jlab.org>; Hovanes Egiyan <hovanes at jlab.org>
Subject: Re: WEDM values not working
The channel access gateway used by Mya, cagwops, was locked up for the second time in three days. This time the lockup was preceded by 217 "Virtual circuit unresponsive" errors related to various iocs at 8:48. Nine minutes later there were 45 "Virtual circuit disconnect" messages before the gateway suspended its main thread and became unresponsive at 8:57. Useful operation seems to have ceased at 8:48. This is a particularly annoying failure mode because if the process crashed altogether it would be restarted automatically.
I restarted the gateway process and noted that over 100,000 signals were connected within a few seconds, so this gateway is getting heavy use. It's possible, if unlikely, that the VM needs more resources.
At Ryan's suggestion I plan to build the latest release of the gateway against the latest production version of EPICS it will support to see if that improves the situation.
--Brian
________________________________
From: Nathan Baltzell <baltzell at jlab.org>
Sent: Wednesday, April 7, 2021 3:30 PM
To: Ryan Slominski <ryans at jlab.org>
Cc: Brian Bevins <bevins at jlab.org>; Tyler Lemon <tlemon at jlab.org>; Marc Mcmullen <mcmullen at jlab.org>
Subject: Re: WEDM values not working
For what it’s worth, I checked with network folks, and there aren’t any known network changes in Hall B since the one a few weeks ago. And things seemed ok until around 8:48 yesterday.
-nathan
On Apr 7, 2021, at 10:51 AM, Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>> wrote:
Yeah, it’s the broadcast address for the Hall B slow controls subnet. It used to be in the appropriate accelerator gateway(s) for archiving and such.
-nathan
On Apr 7, 2021, at 10:47 AM, Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>> wrote:
I'm pretty rusty with IP4 networking - is that a broadcast address? On opsl00 setting my EPICS_CA_ADDR_LIST to that value and then attempting a caget doesn't work, I'm assuming because broadcast addresses can't be routed across subnets perhaps so I would need to be on a hall host to do this (hence gateways)?
ryans at opsl00 ~ > env | grep EPICS_CA_ADDR
EPICS_CA_ADDR_LIST=129.57.163.255
ryans at opsl00 ~ > caget tedf:press:ch03
tedf:press:ch03 --- Invalid channel name
ryans at opsl00 ~ > caget B_DET_RICH_DRYBOX_1_HUM
B_DET_RICH_DRYBOX_1_HUM --- Invalid channel name
Ryan
________________________________
From: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Sent: Wednesday, April 7, 2021 10:28 AM
To: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>
Cc: Brian Bevins <bevins at jlab.org<mailto:bevins at jlab.org>>; Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>; Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Subject: Re: WEDM values not working
129.57.163.255 would be the IP for EPICS_CA_ADDR_LIST for Hall B slow controls. I don’t remember which accelerator gateway(s) it used to be in.
On Apr 7, 2021, at 10:16 AM, Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>> wrote:
The screens which have PVs that do not resolve any longer are in the OPS fiefdom WEDM screen server (epicswebops). The EPICS_CA_ADDR_LIST for the server is:
EPICS_CA_ADDR_LIST="129.57.255.4 129.57.255.6 129.57.255.10"
That's the three gateways:
cagw, cagwsite, cagwops
So apparently one or more of those gateways used to resolve the PVs (aside: does duplicate resolution paths pose an issue?)
When testing on opsl00 though, it actually uses an EPICS name server by default:
ryans at opsl00 ~ > env | grep EPICS_CA_ADDREPICS_CA_ADDR_LIST=129.57.255.21
(opsns)
So it isn't resolvable there either. What does your EPICS_CA_ADDR_LIST environment variable look like for those of you where the PVs do resolve?
________________________________
From: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>
Sent: Wednesday, April 7, 2021 9:41 AM
To: Brian Bevins <bevins at jlab.org<mailto:bevins at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>; Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>; Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Subject: Re: WEDM values not working
Brian,
It appears some PVs may not be resolving through some gateways anymore. See discussion below. For example:
tedf:press:ch03
tedf:press:ch04
(though I'm not sure which gateway this used to work through).
________________________________
From: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Sent: Wednesday, April 7, 2021 9:39 AM
To: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Cc: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>; Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
Yes.
I spot checked some random PVs in the archiver, and it looks like the archiver lost channel access to all Hall B PVs yesterday morning around 8:48. Probably need to check the accelerator’s gateways?
On Apr 7, 2021, at 9:26 AM, Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>> wrote:
Nathan,
Just to be sure, the latest database file is what is running on the IOC? It has the additional exhaust PVs (tedf:ex:gas:ch#).
Regards,
Marc McMullen
Sr. Instrumentation Designer
Detector Support Group
(757) 269-7738
From: Nathan Baltzell<mailto:baltzell at jlab.org>
Sent: Wednesday, April 7, 2021 9:21 AM
To: Ryan Slominski<mailto:ryans at jlab.org>
Cc: Marc Mcmullen<mailto:mcmullen at jlab.org>; Tyler Lemon<mailto:tlemon at jlab.org>
Subject: Re: WEDM values not working
I’ve never looked at them via wedm. Their IOC hasn’t changed. There was some network work in Hall B a few weeks ago, but I assumed this was a recent change. Marc, when was the last time you saw them on wedm?
Hmm, I just checked some stuff through epicsweb.jlab.org/wave<http://epicsweb.jlab.org/wave>, and some live PVs aren’t resolving there either. I guess I need to check if the archiver lost them too.
On Apr 7, 2021, at 9:09 AM, Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>> wrote:
Are you sure this used to work? Did the network that the PV is on change? Or the gateway that it is accessed through? For example, I cannot do a caget for the PVs listed on the screen as input and output pressure from opsl00:
ryans at opsl00 ~ > caget tedf:press:ch04
tedf:press:ch04 --- Invalid channel name
ryans at opsl00 ~ > caget tedf:press:ch03
tedf:press:ch03 --- Invalid channel name
<D66D1D9709AC4C37A156022E14503E39.png>
From: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Sent: Wednesday, April 7, 2021 9:00 AM
To: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>; Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>
Subject: RE: WEDM values not working
Yeah,
I checked and made sure the IOC was working yesterday. I was able to restart it remotely…so it must still be on the network.
This is some wired WEDM thing. Tyler’s page and my page are still not updating.
Regards,
Marc McMullen
Sr. Instrumentation Designer
Detector Support Group
(757) 269-7738
From: Nathan Baltzell<mailto:baltzell at jlab.org>
Sent: Wednesday, April 7, 2021 8:57 AM
To: Marc Mcmullen<mailto:mcmullen at jlab.org>
Cc: Tyler Lemon<mailto:tlemon at jlab.org>
Subject: Re: WEDM values not working
I didn’t hear about it. Maybe it was specific to accelerator networks.
But your PVs are alive and well, maybe they’re still recovering from the outage.
On Apr 6, 2021, at 8:50 PM, Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>> wrote:
tedf:stop
Apparently, there was a big network outtage?
<F6A6F774D45D4460A713D217EA8625EB.png>
From: Nathan Baltzell <baltzell at jlab.org<mailto:baltzell at jlab.org>>
Sent: Tuesday, April 6, 2021 5:20 PM
To: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
Hi Marc,
Can you an example of one PV name in question?
-Nathan
On Apr 6, 2021, at 2:51 PM, Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>> wrote:
Hi Nathan,
I just wanted to loop you in on this. The IOC is definitely working, but our WEDM screens are not uploading.
Marc
<DD75B35138E2486F99FF461D06BC2BEC.png>
From: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Sent: Tuesday, April 6, 2021 2:49 PM
To: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>; epicsweb at jlab.org<mailto:epicsweb at jlab.org> <epicsweb at jlab.org<mailto:epicsweb at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
Ryan,
Do you know who might know what is going on with this. I am able to log into the opsl00 and my software which populates the PVs are rolling along. This seems to be local to just the network associated with WEDM...maybe?
Marc
<DD75B35138E2486F99FF461D06BC2BEC.png>
From: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>
Sent: Tuesday, April 6, 2021 11:46 AM
To: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>; epicsweb at jlab.org<mailto:epicsweb at jlab.org> <epicsweb at jlab.org<mailto:epicsweb at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
I restarted the service on epicswebops, which is for the OPS fiefdom (that URL is for ops). However, it looks like the screen still isn't updating. We had a bunch of network outages this morning, which likely the cause of this issue. Lots of servers went offline, I'm not sure the current status.
<DD75B35138E2486F99FF461D06BC2BEC.png>
From: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Sent: Tuesday, April 6, 2021 11:43 AM
To: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>; epicsweb at jlab.org<mailto:epicsweb at jlab.org> <epicsweb at jlab.org<mailto:epicsweb at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
https://epicsweb.jlab.org/wedm/screen?edl=%2fcs%2fopshome%2fedm%2fdsg%2fGEM_flow_press.edl
It was working yesterday.
<DD75B35138E2486F99FF461D06BC2BEC.png>
From: Ryan Slominski <ryans at jlab.org<mailto:ryans at jlab.org>>
Sent: Tuesday, April 6, 2021 11:22 AM
To: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>; epicsweb at jlab.org<mailto:epicsweb at jlab.org> <epicsweb at jlab.org<mailto:epicsweb at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: Re: WEDM values not working
Which screens (copy and paste URL to screen)? Was it working yesterday? The servers reboot automatically at midnight I believe. There are actually separate servers for each fiefdom so I'd prefer to know which server is hung up instead of rebooting everything manually. Knowing the screen path would tell me which of the servers to reboot: CHL, LERF, ITF, SRF, OPS
<DD75B35138E2486F99FF461D06BC2BEC.png>
From: Marc Mcmullen <mcmullen at jlab.org<mailto:mcmullen at jlab.org>>
Sent: Tuesday, April 6, 2021 11:17 AM
To: epicsweb at jlab.org<mailto:epicsweb at jlab.org> <epicsweb at jlab.org<mailto:epicsweb at jlab.org>>
Cc: Tyler Lemon <tlemon at jlab.org<mailto:tlemon at jlab.org>>
Subject: WEDM values not working
Hi,
We have two separate WEDM screens not updating on opsl00. Is there a way to reboot something?
Thanks,
Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/dsg-halla_ecal/attachments/20240424/f902bed8/attachment-0001.html>
More information about the Dsg-halla_ecal
mailing list