-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a diagnostic kstat for obtaining pool status #17076
base: master
Are you sure you want to change the base?
Conversation
Don't mean to oppose this PR somehow, but just wanted to highlight for anybody passing by that there's already a method to get just a pool's status without locks via |
for clarity, that returns: Not something analogous to |
This kstat output does not require taking the spa_namespace lock, as in the case for `zpool status`. It can be used for investigations when pools are in a hung state while holding global locks required for a traditional 'zpool status' to proceed. The newly introduced `zfs_lockless_read_enabled` module parameter enables traversal of related kernel structures in cases where the required read locks cannot be taken. When `zfs_lockless_read_enabled` is set, this kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices). Co-authored-by: Don Brady <[email protected]> Co-authored-by: Umer Saleem <[email protected]> Signed-off-by: Igor Ostapenko <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
938438b
to
7e13585
Compare
I think having a module param to disable locks for these rare debug situations is a valid use case. If we go down that path though, I think there's potentially more benefit in disabling locks for the userspace commands rather than a using a kstat. That would let us use the familiar I have a rough prototype here that seems to work (lightly tested only!): # Disable locks on 'tank'
echo -n tank > /sys/module/zfs/parameters/zfs_debug_skip_locks_on_pool
# Disable locks on all pools
echo -n '*' > /sys/module/zfs/parameters/zfs_debug_skip_locks_on_pool
zpool status
zpool status -j
... The prototype is slightly more dangerous than the kstat version since it does traverse the spa_namespace avl tree (locklessly). But the odds that the tree changes out from under you while running Thoughts? |
So the lock we are overriding with a tunable here is the pool specific config lock. The lock we avoid using the kstat is the all-pools spa namespace lock. I am not sure it is very practical or safe to do |
The
For normal operation, With locks disabled, the safety would pretty much the same for both lockless kstat and |
Motivation and Context
A hung pool process can be left holding the spa config lock or the spa namespace lock. If an admin wants to observe the status of a pool using the traditional zpool status, it could hang waiting for one of the locks held by the stuck process. It would be nice to observe pool status in this scenario without the risk of the inquiry hanging.
This PR is an aggregated and updated version of #16026 and #16484.
Description
This change adds
/proc/spl/kstat/zfs/<poolname>/status_json
.This kstat output does not require taking the spa_namespace lock, as in the case for 'zpool status'. It can be used for investigations when pools are in a hung state while holding global locks required for a traditional 'zpool status' to proceed.
The newly introduced
zfs_lockless_read_enabled
module parameter enables traversal of related kernel structures in cases where the required config read locks cannot be taken.When
zfs_lockess_read_enabled
is set, this kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices).The idea is to follow
zpool status -jp
output as much as possible:How Has This Been Tested?
Added new
pool_status_json.ksh
atuomated test which compares the output withzpool status -jp
.Types of changes
Checklist:
Signed-off-by
.