[HydraTeam] Fwd: disk IO issues on epscidb-a

Thomas Britton tbritton at jlab.org
Wed Jan 24 08:54:42 EST 2024


Thomas Britton

Begin forwarded message:

From: Marty Wise <wise at jlab.org>
Date: January 24, 2024 at 8:36:56 AM EST
To: Thomas Britton <tbritton at jlab.org>
Cc: Kelvin Edwards <kelvin at jlab.org>, Paul Letta <letta at jlab.org>, Myung Bang <bangdm at jlab.org>
Subject: disk IO issues on epscidb-a


Thomas,

I may have found the source of the disk IO issues I’ve seen on epscidb-a. Looking at the ganglia graphs for it over the last couple of months, it shows a large amount of IOWait (usually kind of orange/salmon colored on the graphs). This is generally something we don’t see much if any at all, so this is definitely unusual, and could produce a variety of symptoms on the system.

Over the past several months, we have begun deploying a cyber security tool (required by the feds) called CrowdStrike/Falcon. For most linux systems, I believe it is deployed only in “monitoring” mode – i.e. it’s not configured to block anything or take any action, just to alert if something is wrong. I noticed there were a lot of these processes running on the system. I turned it off and waited overnight to see how the disk IO/IOWait status changed.

[image001.png]

As you can see, Wait state stats dropped dramatically about the time I disabled CrowdStrike. Now.. maybe something else happened around that time, so this isn’t conclusive yet, but looks very suspicious. Are you aware of any significant change late yesterday afternoon that might account for the difference?

So, I am continuing to monitor the system. Please let me know if you experience any problems or notice anything unusual.

BTW – we are meeting with the CrowdStrike reps soon and will discuss this apparent problem with them.  I should say, that it’s entirely possible that this issue is related to some misconfiguration or system peculiarity and not a problem with CrowdStrike itself (I have not noticed similar issues elsewhere)… so hopefully we can work out the problem and re-enable CrowdStrike on the system. But, I will leave it disabled for now.

Marty Wise
JLab CST/CNI

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/hydrateam/attachments/20240124/55c997d2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 119552 bytes
Desc: image001.png
URL: <https://mailman.jlab.org/pipermail/hydrateam/attachments/20240124/55c997d2/attachment-0001.png>


More information about the Hydrateam mailing list