[ace] Weird workstation behavior

Adam Carpenter adamc at jlab.org
Mon Feb 12 16:45:01 EST 2024


Hi Brian,

There were a lot of error messages.  I would guess either a gnome-extension or an old TCL GUI went a little crazy.  I don't have a great understanding of memory pressure.  If Xorg got killed by OOM, then it looked like the biggest memory hog based on memory allocated and memory released.  I guess a client doing something weird and tricking xorg into building memory structures that aren't freed would do it.  I think my read on the situation is the same as yours, but it would take a lot of work to be more certain.  I mentioned the TCL process since it's probably in-house software and might show up again later.

I didn't see a message about a user reboot being triggered so I figured you must have powered it off manually.  I'm not certain if I should see a message.  It's not a big deal either way.

-Adam

Adam Carpenter
Accelerator Operations Software Department
Thomas Jefferson National Accelerator Facility
________________________________
From: Brian Bevins <bevins at jlab.org>
Sent: Monday, February 12, 2024 4:28 PM
To: Adam Carpenter <adamc at jlab.org>; ace at jlab.org <ace at jlab.org>
Subject: Re: Weird workstation behavior

Adam,

Thanks for looking into this. Does the fact that OOM killed Xorg indicate that Xorg was the memory hog? Or at least the apparent memory hog? Presumably some client process(es) were forcing it to leak. The segfault and coredump don't surprise me if the processes could no longer allocate memory for transient objects.

I didn't hit the power button. I did a soft reboot from the login screen, which in retrospect was dumb since I suspected the login screen itself. It did seem like the reboot happened too quickly, but uptime says that it last rebooted at ~14:44 and it worked after that.

--Brian



________________________________
From: Adam Carpenter <adamc at jlab.org>
Sent: Monday, February 12, 2024 4:05 PM
To: ace at jlab.org <ace at jlab.org>; Brian Bevins <bevins at jlab.org>
Subject: Re: Weird workstation behavior

Brian,

I see two main things going on with your workstation.  The first is that your NX server licence expired Feb 8 at ~8pm.  Anthony said that shouldn't matter, but I am mentioning it just in case it does.  Were you able to login on Friday or over the weekend?

The second thing is that gdm was having some trouble and the out-of-memory killer was called at Feb 12, 2pm.  It looks like there has been a slow memory leak building for a least a couple of weeks. All around the same time of 2:00pm today (+/- minutes) I see

  *   Xorg got killed by OOM
  *
tclsh8.6 had a segfault in libtclcdev1.7.6
  *   emacs coredumped

It looks like things tried to recovery but I see a bunch of errors after this.  All of your logins failed with a 'system error' message.  Then at 14:43 what looks like the start of a boot sequence. I guess that was you mashing the power button?  I think this is a plain old system crash and not something nefarious.  Let us know if it happens again.

Adam Carpenter
Accelerator Operations Software Department
Thomas Jefferson National Accelerator Facility
________________________________
From: ace <ace-bounces at jlab.org> on behalf of Brian Bevins via ace <ace at jlab.org>
Sent: Monday, February 12, 2024 3:10 PM
To: ace at jlab.org <ace at jlab.org>
Subject: [ace] Weird workstation behavior

When I came onsite this afternoon my workstation was behaving strangely. It had apparently rebooted and was sitting at the session login prompt, but it wouldn't accept my password. Caps lock was off and I tried it 8 times.

Eventually I rebooted the workstation, when I went to do that I got a warning that user gdm was logged in. I don't usually get that when I restart. After the restart, it accepted my password on the first try.

Out of paranoia I changed my password. Everything's working fine now, but I wanted to mention it since it felt very much like a fake login screen.

--Brian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20240212/fff3e5cf/attachment.html>


More information about the ace mailing list