You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current version of smartmontools (specifically 7.4) does not work with all HW. The version 7.2 that was in the 202311 SONiC image worked fine with out HW.
The specific drive model we have that is failing is (It is an NVME drive):
Model Number: HFS480GEJ8X176N
Steps to reproduce the issue:
smartctl -a /dev/nvme0n1 ;# Returns -4 instead of 0 and bails out when checking self test status logs.
Describe the results you received:
Describe the results you expected:
Should be downgraded back to 7.2 or move to a current 7.5 build.
Output of show version:
root@sonic:/opt/cisco/etc/sonic# show version
SONiC Software Version: SONiC.mckenzie-dev_202405.0-dirty-20241206.102347
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: 31089c683
Build date: Fri Dec 6 18:47:19 UTC 2024
Built by: scott@vxr-slurm-255
Platform: x86_64-85_rp_o-r0
HwSKU: Cisco-85-RP-O
ASIC: cisco-8000
ASIC Count: 1
Serial Number: FLM282802LK
Model Number: 85-RP-O
Hardware Revision: 0.3
Uptime: 18:21:57 up 1:12, 1 user, load average: 3.48, 3.28, 3.19
Date: Mon 04 Nov 2024 18:21:57
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 34 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 13,967,376 [7.15 TB]
Data Units Written: 843,112 [431 GB]
Host Read Commands: 56,028,524
Host Write Commands: 10,870,631
Controller Busy Time: 43
Power Cycles: 67
Power On Hours: 1,762
Unsafe Shutdowns: 65
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 29 Celsius
Temperature Sensor 2: 39 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Read Self-test Log failed: Invalid Field in Command (0x002)
smartctl rc 4
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
With the HW we have it fails 100%, when loading the 202311 image it passes.
The text was updated successfully, but these errors were encountered:
Description
The current version of smartmontools (specifically 7.4) does not work with all HW. The version 7.2 that was in the 202311 SONiC image worked fine with out HW.
The specific drive model we have that is failing is (It is an NVME drive):
Model Number: HFS480GEJ8X176N
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Should be downgraded back to 7.2 or move to a current 7.5 build.
Output of
show version
:root@sonic:/opt/cisco/etc/sonic# show version
SONiC Software Version: SONiC.mckenzie-dev_202405.0-dirty-20241206.102347
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: 31089c683
Build date: Fri Dec 6 18:47:19 UTC 2024
Built by: scott@vxr-slurm-255
Platform: x86_64-85_rp_o-r0
HwSKU: Cisco-85-RP-O
ASIC: cisco-8000
ASIC Count: 1
Serial Number: FLM282802LK
Model Number: 85-RP-O
Hardware Revision: 0.3
Uptime: 18:21:57 up 1:12, 1 user, load average: 3.48, 3.28, 3.19
Date: Mon 04 Nov 2024 18:21:57
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-platform-monitor latest 9e3c6075cb3d 460MB
docker-platform-monitor mckenzie-dev_202405.0-dirty-20241206.102347 9e3c6075cb3d 460MB
docker-snmp latest 5ea91caa41de 375MB
docker-snmp mckenzie-dev_202405.0-dirty-20241206.102347 5ea91caa41de 375MB
docker-dhcp-relay latest 6fff90024fb5 340MB
docker-macsec latest cfc4248c3ed7 362MB
docker-eventd latest 9bd4289f7d18 331MB
docker-eventd mckenzie-dev_202405.0-dirty-20241206.102347 9bd4289f7d18 331MB
docker-gbsyncd-cisco latest 1b8e755098fc 391MB
docker-gbsyncd-cisco mckenzie-dev_202405.0-dirty-20241206.102347 1b8e755098fc 391MB
docker-fpm-frr latest 869549655343 391MB
docker-fpm-frr mckenzie-dev_202405.0-dirty-20241206.102347 869549655343 391MB
docker-nat latest 52a617932035 362MB
docker-nat mckenzie-dev_202405.0-dirty-20241206.102347 52a617932035 362MB
docker-sflow latest 10f9c43c13a2 360MB
docker-sflow mckenzie-dev_202405.0-dirty-20241206.102347 10f9c43c13a2 360MB
docker-orchagent latest dcc8958c9627 372MB
docker-orchagent mckenzie-dev_202405.0-dirty-20241206.102347 dcc8958c9627 372MB
docker-sonic-mgmt-framework latest 3b9d0ef54431 418MB
docker-sonic-mgmt-framework mckenzie-dev_202405.0-dirty-20241206.102347 3b9d0ef54431 418MB
docker-teamd latest 85bb0d0538d7 359MB
docker-teamd mckenzie-dev_202405.0-dirty-20241206.102347 85bb0d0538d7 359MB
docker-router-advertiser latest a16b1ff34dfb 331MB
docker-router-advertiser mckenzie-dev_202405.0-dirty-20241206.102347 a16b1ff34dfb 331MB
docker-lldp latest 17ea44604be6 377MB
docker-lldp mckenzie-dev_202405.0-dirty-20241206.102347 17ea44604be6 377MB
docker-database latest d814df62760d 339MB
docker-database mckenzie-dev_202405.0-dirty-20241206.102347 d814df62760d 339MB
docker-sonic-gnmi latest df9565c0a9eb 415MB
docker-sonic-gnmi mckenzie-dev_202405.0-dirty-20241206.102347 df9565c0a9eb 415MB
docker-mux latest ac37bd65a4d7 383MB
docker-mux mckenzie-dev_202405.0-dirty-20241206.102347 ac37bd65a4d7 383MB
docker-ipxeserver-cisco latest 621a752c8ee8 353MB
docker-ipxeserver-cisco mckenzie-dev_202405.0-dirty-20241206.102347 621a752c8ee8 353MB
docker-syncd-cisco latest 13b58b3604ca 1.1GB
docker-syncd-cisco mckenzie-dev_202405.0-dirty-20241206.102347 13b58b3604ca 1.1GB
root@sonic:/opt/cisco/etc/sonic#
Output of
show techsupport
:Problem output:
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.4.0-192-generic x86_64)
System information as of Fri 06 Dec 2024 01:54:17 PM PST
System load: 5.26 Users logged in: 29
Usage of /: 19.5% of 97.87GB IPv4 address for docker0: 172.17.0.1
Memory usage: 16% IPv4 address for eno7: 172.26.228.181
Swap usage: 4% IPv4 address for virbr0: 192.168.122.1
Processes: 1745
=> There are 5 zombie processes.
98 updates can be applied immediately.
1 of these updates is a standard security update.
To see these additional updates run: apt list --upgradable
New release '22.04.5 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
1 updates could not be installed automatically. For more details,
see /var/log/unattended-upgrades/unattended-upgrades.log
*** System restart required ***
** Workspaces older than six months **
Fri 06 Dec 2024 01:00:01 PM PST
449G total
79G /nobackup/manamand/sonic-build
48G /nobackup/phemadri/sonic-buildimage
48G /nobackup/kaima/sonic-1
47G /nobackup/skayamku/sonic
46G /nobackup/wjacob/sonic
42G /nobackup/jeflo/sonic_tortuga_2
36G /nobackup/thgowda/tortuga-202205
12G /nobackup/wjacob/cleanup
8.9G /nobackup/athingal/swss_env
8.7G /nobackup/jrode/sdk
Last login: Mon Dec 2 14:21:38 2024 from 10.28.39.44
vxr-slurm-255:~> cat /nobackup/scott/scott-smart
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.0-22-2-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: HFS480GEJ8X176N
Serial Number: ****
Firmware Version: 51090A30
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Total NVM Capacity: 480,103,981,056 [480 GB]
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 16
Namespace 1 Size/Capacity: 480,103,981,056 [480 GB]
Namespace 1 Utilization: 11,587,231,744 [11.5 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 00452d0b0e
Local Time is: Mon Nov 4 18:36:30 2024 UTC
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x065f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Get_LBA_Sts Lockdown
Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x7e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Log0_FISE_MI Telmtry_Ar_4
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 74 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Namespace 1 Features (0x12): NA_Fields NP_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.25W 0.00W - 0 0 0 0 30000 30000
1 + 7.00W 0.00W - 1 1 1 1 30000 30000
2 + 6.00W 0.00W - 2 2 2 2 30000 30000
3 + 5.00W 0.00W - 3 3 3 3 30000 30000
4 - 5.00W - - 3 3 3 3 30000 30000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 34 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 13,967,376 [7.15 TB]
Data Units Written: 843,112 [431 GB]
Host Read Commands: 56,028,524
Host Write Commands: 10,870,631
Controller Busy Time: 43
Power Cycles: 67
Power On Hours: 1,762
Unsafe Shutdowns: 65
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 29 Celsius
Temperature Sensor 2: 39 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Read Self-test Log failed: Invalid Field in Command (0x002)
smartctl rc 4
Additional information you deem important (e.g. issue happens only occasionally):
With the HW we have it fails 100%, when loading the 202311 image it passes.
The text was updated successfully, but these errors were encountered: