[ace] [New Logentry] snapshot file server issues
cuffe at jlab.org
cuffe at jlab.org
Tue Aug 22 14:00:01 EDT 2023
Logentry Text:
--
Our snapshot backup server (snapfs) began having issues early this morning. The errors indicated file system for disk issues. The system was rebooted and now seems to be functioning normally. The only oddity was that the OS reported a predicated disk failure but when investigating the disks with the Dell tools and the baseboard management controller, the disk is reported as fine. It took a bit to map figure out the correct disk since these tools do not exact report the same information in the same way. It is Physical Disk 0:0:2 that is a member of the snap3 virtual disk. We will keep an eye on it and swap the disk if issues return.
OpenMange Tools:
###############
Status OK
Name Physical Disk 0:0:2
Device Description Disk 2 in Enclosure 0 on Connector 0 of RAID Controller in Slot 3
State Online
Operational State Not Applicable
Slot Number 2
Size 2794.00 GB
Block Size 512 bytes
Security Status Not Capable
Bus Protocol SAS
Media Type HDD
Hot Spare No
Remaining Rated Write Endurance Not Applicable
Failure Predicted No
smartd tools:
###############
[root at snapfs ~]# smartctl -a -d megaraid,1 /dev/sdd1
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.92.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: DKS2D-H3R0SS
Revision: 6FB0
Compliance: SPC-3
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500418452e7
Serial number: Z292G3WT00009239YA99
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Aug 22 13:52:36 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]
Current Drive Temperature: 29 C
Drive Trip Temperature: 68 C
Manufactured in week 15 of year 2012
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 223
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 223
Elements in grown defect list: 255
Vendor (Seagate Cache) information
Blocks sent to initiator = 3148312464
Blocks received from initiator = 271932906
Blocks read from cache and sent to initiator = 1279949280
Number of read and write commands whose size <= segment size = 265556356
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5167.08
number of minutes until next internal SMART test = 32
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2812477162 82 0 2812477244 82 52587.991 0
write: 0 0 0 0 0 11214.903 0
verify: 2888151797 61 0 2888151858 61 20996.633 0
Non-medium error count: 20
########################
---
This is a plain text email for clients that cannot display HTML. The full logentry can be found online at https://logbooks.jlab.org/entry/4169736
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20230822/e2c5e0c6/attachment.html>
More information about the ace
mailing list