使用sas2ircu在TrueNAS Scale中定位坏硬盘
场景描述
长期使用的一个ZFS硬盘阵列出现了硬盘损坏,SMART测试报大量错误。但是由于硬盘安装的时候没有做标记,因此面对12个硬盘位害怕抽错硬盘导致阵列GG。同时阵列正在执行读写和新盘的同步,不太方便停机抽出来看,因此需要在机器运行的时候定位损坏的硬盘。
基本环境
服务器:RH2288H V2
硬盘背板:SAS2308
操作系统:Esxi8直通SAS2308,TrueNAS-SCALE-22.02.4
操作流程
1、通过SSH登录TrueNAS Scale
如果在操作过程中出现`SAS2IRCU: MPTLib2 Error 1`,一般是权限问题,请加sudo或使用root账户。
1 2 3 4 5 6 | admin@truenas[/mnt]$ sas2ircu list LSI Corporation SAS2 IR Configuration Utility. Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved. SAS2IRCU: MPTLib2 Error 1 |
2、检查sas2ircu是否能识别阵列卡
1 2 3 4 5 6 7 8 9 10 11 | root@truenas[~] # sas2ircu list LSI Corporation SAS2 IR Configuration Utility. Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved. Adapter Vendor Device SubSys SubSys Index Type ID ID Pci Address Ven ID Dev ID ----- ------------ ------ ------ ----------------- ------ ------ 0 SAS2308_2 1000h 87h 00h:0bh:00h:00h 1000h 0087h SAS2IRCU: Utility Completed Successfully. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | root@truenas[~] # sas2ircu 0 display LSI Corporation SAS2 IR Configuration Utility. Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved. Read configuration has been initiated for controller 0 ------------------------------------------------------------------------ Controller information ------------------------------------------------------------------------ Controller type : SAS2308_2 BIOS version : 7.25.00.00 Firmware version : 15.00.03.00 Channel description : 1 Serial Attached SCSI Initiator ID : 0 Maximum physical devices : 255 Concurrent commands supported : 3072 Slot : 0 Segment : 0 Bus : 11 Device : 0 Function : 0 RAID Support : Yes ------------------------------------------------------------------------ IR Volume information ------------------------------------------------------------------------ ------------------------------------------------------------------------ Physical device information ------------------------------------------------------------------------ 略 |
3、在TrueNAS Scale上找到发生损坏的硬盘的序列号(Storage -> Disks -> Serial),是序列号(Serial No)不是硬盘型号(Model Number)。
4、在硬盘信息中找到该硬盘的相关信息
1 2 3 4 5 6 7 8 9 10 | root@truenas[~] # sas2ircu 0 display | grep -B 8 WCC4E3LJFF91 Enclosure # : 2 Slot # : 5 SAS Address : 500e004-a-aaaa-aa05 State : Ready (RDY) Size ( in MB)/( in sectors) : 3815447 /7814037167 Manufacturer : ATA Model Number : WDC WD40PURX-64G Firmware Revision : 0A80 Serial No : WDWCC4E3LJFF91 |
5、从上述信息中找到Enclosure
编号和Slot
编号,构成硬盘盘位的编号Enclosure:Slot
,例子中即为:2:5
6、使用定位指令与硬盘盘位号让硬盘盘位的知识灯亮起来
1 2 3 4 5 6 7 8 | root@truenas[~]# sas2ircu 0 locate 2:5 on LSI Corporation SAS2 IR Configuration Utility. Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved. SAS2IRCU: LOCATE command completed successfully. SAS2IRCU: Command LOCATE Completed Successfully. SAS2IRCU: Utility Completed Successfully. |
关灯
1 2 3 4 5 6 7 8 | root@truenas[~]# sas2ircu 0 locate 2:5 off LSI Corporation SAS2 IR Configuration Utility. Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved. SAS2IRCU: LOCATE command completed successfully. SAS2IRCU: Command LOCATE Completed Successfully. SAS2IRCU: Utility Completed Successfully. |
7、可以看到机箱上的灯已经亮起(或闪烁)