[ace] [New Logentry] snapshot file server issues

cuffe at jlab.org cuffe at jlab.org
Tue Aug 22 14:00:01 EDT 2023


Logentry Text:
--
Our snapshot backup server (snapfs) began having issues early this morning.  The errors indicated file system for disk issues.  The system was rebooted and now seems to be functioning normally.  The only oddity was that the OS reported a predicated disk failure but when investigating the disks with the Dell tools and the baseboard management controller, the disk is reported as fine.  It took a bit to map figure out the correct disk since these tools do not exact report the same information in the same way.  It is Physical Disk 0:0:2 that is a member of the snap3 virtual disk.  We will keep an eye on it and swap the disk if issues return.

OpenMange Tools:
###############
Status      OK	 
Name	Physical Disk 0:0:2
Device Description	Disk 2 in Enclosure 0 on Connector 0 of RAID Controller in Slot 3
State	Online
Operational State	Not Applicable
Slot Number	2
Size	2794.00 GB
Block Size	512 bytes
Security Status	Not Capable
Bus Protocol	SAS
Media Type	HDD
Hot Spare	No
Remaining Rated Write Endurance	Not Applicable
Failure Predicted	No

smartd tools:
###############
[root at snapfs ~]# smartctl -a -d megaraid,1 /dev/sdd1
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.92.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              DKS2D-H3R0SS
Revision:             6FB0
Compliance:           SPC-3
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500418452e7
Serial number:        Z292G3WT00009239YA99
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Aug 22 13:52:36 2023 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d, ascq=0]

Current Drive Temperature:     29 C
Drive Trip Temperature:        68 C

Manufactured in week 15 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  223
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  223
Elements in grown defect list: 255

Vendor (Seagate Cache) information
  Blocks sent to initiator = 3148312464
  Blocks received from initiator = 271932906
  Blocks read from cache and sent to initiator = 1279949280
  Number of read and write commands whose size <= segment size = 265556356
  Number of read and write commands whose size > segment size = 0

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 5167.08
  number of minutes until next internal SMART test = 32

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2812477162       82         0  2812477244         82      52587.991           0
write:         0        0         0         0          0      11214.903           0
verify: 2888151797       61         0  2888151858         61      20996.633           0

Non-medium error count:       20
########################

---

This is a plain text email for clients that cannot display HTML.  The full logentry can be found online at https://logbooks.jlab.org/entry/4169736
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/ace/attachments/20230822/e2c5e0c6/attachment.html>


More information about the ace mailing list