Skip to content

Refactor the way we handle Hook Creation #1090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ArangoGutierrez
Copy link
Collaborator

This pull request refactors the codebase to introduce a new HookCreator interface, replacing the previous direct use of nvidiaCDIHookPath for creating hooks. This change improves modularity and makes the code more extensible by centralizing hook creation logic. The changes span multiple files and functions, focusing on updating constructors, methods, and tests to use the new HookCreator abstraction.

Refactoring to Use HookCreator:

  • Centralized Hook Creation:

    • Introduced HookCreator interface and its implementation (CDIHook) to encapsulate hook creation logic. This replaces the direct use of nvidiaCDIHookPath for creating hooks. (internal/discover/hooks.go, internal/discover/hooks.goL28-R102)
    • Updated various hook creation methods (e.g., createLDCacheUpdateHook, Create("create-symlinks")) to use HookCreator. (internal/discover/hooks.go, internal/discover/hooks.goL28-R102)
  • Function and Constructor Updates:

    • Replaced nvidiaCDIHookPath parameter with HookCreator in constructors and functions such as NewCUDACompatHookDiscoverer, NewGraphicsMountsDiscoverer, and NewLDCacheUpdateHook. (internal/discover/compat_libs.go, [1]; internal/discover/graphics.go, [2] [3]
    • Updated related methods and structs to use hookCreator instead of nvidiaCDIHookPath. (internal/discover/graphics.go, [1] [2]

Code Simplification and Cleanup:

  • Removed Redundant Hook Creation Logic:

    • Eliminated repetitive hook creation functions like CreateNvidiaCDIHook and CreateCreateSymlinkHook, consolidating their behavior into the HookCreator implementation. (internal/discover/hooks.go, internal/discover/hooks.goL28-R102)
  • Simplified Hook Handling:

    • Updated Hooks() methods across various discoverers to delegate hook creation to HookCreator. (internal/discover/graphics.go, [1]; internal/discover/ldconfig.go, [2]

Tests and Validation:

  • Test Updates:
    • Modified tests to use the new HookCreator abstraction, ensuring compatibility with the refactored code. (internal/discover/graphics_test.go, [1]; internal/discover/ldconfig_test.go, [2]

These changes enhance the maintainability and scalability of the codebase by abstracting hook creation logic, reducing redundancy, and improving testability.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the hook creation logic to use a centralized HookCreator interface in place of the direct use of nvidiaCDIHookPath.

  • Centralizes hook creation by introducing the HookCreator interface and updating its implementation.
  • Updates multiple discoverers, constructors, and tests across the codebase to use the new abstraction.
  • Simplifies and consolidates redundant hook creation functions.

Reviewed Changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/nvcdi/driver-nvml.go Replaces nvidiaCDIHookPath with hookCreator for hook creation.
pkg/nvcdi/common-nvml.go Updates NewGraphicsMountsDiscoverer to use hookCreator.
internal/platform-support/tegra/*.go Adjusts Tegra options and symlinks to leverage hookCreator.
internal/platform-support/dgpu/*.go Replaces direct hook path usage with hookCreator in DGPU discoverers.
internal/modifier/*.go Updates modifiers to instantiate hookCreator from configuration.
internal/discover/*.go Centralizes hook creation in discoverers and adds error-checking.
internal/discover/hooks.go Refactors Hook methods with pointer receivers and defines HookCreator.
internal/discover/graphics_test.go Updates tests to follow the new HookCreator interface.
Comments suppressed due to low confidence (1)

internal/discover/hooks.go:39

  • [nitpick] Ensure that consistent nil-checks are applied in all pointer receiver methods (Devices, Mounts, Hooks) to avoid unexpected behavior when a nil Hook is encountered.
func (h *Hook) Hooks() ([]Hook, error) { if h == nil { return nil, nil } }

.gitignore Outdated
@@ -12,3 +12,6 @@
/shared-*
/release-*
/bin
/nvidia-cdi-hook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we replace the nvidia-* binaries with a /nvidia-* rule?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this line as well.

@@ -35,12 +35,12 @@ func NewGraphicsModifier(logger logger.Interface, cfg *config.Config, containerI
return nil, nil
}

nvidiaCDIHookPath := cfg.NVIDIACTKConfig.Path
hookCreator := discover.NewHookCreator(cfg.NVIDIAContainerCLIConfig.Path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to also pass this in as an argument to the function instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is ok to leave it as is, as the value of NVIDIACTKConfig.Path of o.nvidiaCDIHookPath it's taken when the pkg's are initialized from the CMD side

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about how this would look from a point of view of a unified config for disabling hooks. At present we contruct this hook creator in multiple places and we would want to ensure that we do that ONCE per code path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a unified config for disabling hooks could be passed as an optional argument when creating the HookCreator, meaning we add a new entry to the internal/config/config to allow users defining disabled hooks even on the config file. and then we have a WithDisabledHooks option that takes the config.DisabledHooks what ever type we decide there, as input.

That's what I think would be a nice long-term goal for this discussion. But not in scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I mean in the context of this PR. Here, we are constructing a HookCreator in NewFeatureGatedModifier as well as in NewGraphicsModifier. (We also separately construct this in the CDI and CSV modifiers and the latter definitely needs to be udpated for consistency, but that is definitely out of scope).

My suggestion is to create a HookModifier once in the newSpecModifier function and the pass it down as required. When we DO add config options, we could then make the change in one place and add the relevant args to the constructor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now, will work on the change

cudaCompatLibHookDiscoverer := discover.NewCUDACompatHookDiscoverer(l.logger, l.nvidiaCDIHookPath, l.driver)
discoverers = append(discoverers, cudaCompatLibHookDiscoverer)
}
// TODO: The following should use the version directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if we're just refactoring the hook creation, should we not still check HookIsSupported here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, since Create will check if the hook is disabled as first thing, I think we could deprecate, HookIsSupported func?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not saying we should keep the function. I'm saying that for the first refactoring step, we ONLY add a HookCreator and then as a follow-up we pull in the ability to disable hooks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this diff so that the only modification is the chane from l.nvidiaCDIHookPath -> l.hookCreator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil
}

links := make([]string, 0, len(args))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of defining the capacity here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
type HookCreator interface {
Create(string, ...string) *Hook
DisableHook(string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having DisableHook as a function here. I know we discussed it, but I think constructing the proper may beforehand and not having this change state is more easy to reason about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean that the list of disabled hooks will be passed down during NewHookCreator and not having a method to allow adding more hooks after that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is what I mean. We may need to move the contrstructor around in lib.go a bit, but I think this is a better long-term solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can have that as a follow-up, os as part of this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

but I think constructing the proper may beforehand and not having this change state is more easy to reason about.

Will be addressed in a followup PR

pkg/nvcdi/lib.go Outdated
@@ -145,7 +150,7 @@ func New(opts ...Option) (Interface, error) {
l.vendor = "management.nvidia.com"
}
// Management containers in general do not require CUDA Forward compatibility.
l.disabledHooks[HookEnableCudaCompat] = true
l.hookCreator.DisableHook(string(HookEnableCudaCompat))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either need to keep this as is, or properly move the handling of disabled hooks to the discover package. I would recommend handling skipping here in the first commit / PR and then adding proper disabled functionality in a follow-up commit or PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"enable-cuda-compat",
args...,
)
return hookCreator.Create("enable-cuda-compat", args...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for a follow-up: Create should be modified to accept a HookName instead of arbitrary strings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but we would need to define the HookName in a new Package, as pkg/nvcdi calls internal/discover , so we can not reference pkg/nvcdi from internal/discover without running into an import loop.
I think...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now have a better idea for this after our conversation, this will be addressed in a followup PR

@@ -73,8 +73,12 @@ func (d *additionalSymlinks) Hooks() ([]Hook, error) {
return hooks, nil
}

hook := CreateCreateSymlinkHook(d.nvidiaCDIHookPath, links).(Hook)
return append(hooks, hook), nil
hook, err := d.hookCreator.Create("create-symlinks", links...).Hooks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the return value is a slice, so we may want to rename the variable createSymlinkHooks:

Suggested change
hook, err := d.hookCreator.Create("create-symlinks", links...).Hooks()
createSymlinkHooks, err := d.hookCreator.Create("create-symlinks", links...).Hooks()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -308,10 +308,12 @@ func TestWithWithDriverDotSoSymlinks(t *testing.T) {

for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
hookCreator := NewHookCreator("/path/to/nvidia-cdi-hook")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be defined outside of the for loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// TODO: The following should use the version directly.

// replace l.nvidiaCDIHookPath for the l.cdiHook to use the inferface
if !l.disabledHooks[HookEnableCudaCompat] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !l.disabledHooks[HookEnableCudaCompat] {
if l.HookIsSupported(HookEnableCudaCompat) {

It should be clear from the change set that this behaviour is NOT affected by this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cudaCompatLibHookDiscoverer := discover.NewCUDACompatHookDiscoverer(l.logger, l.nvidiaCDIHookPath, l.driver)
// TODO: The following should use the version directly.

// replace l.nvidiaCDIHookPath for the l.cdiHook to use the inferface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// replace l.nvidiaCDIHookPath for the l.cdiHook to use the inferface

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/nvcdi/lib.go Outdated
}

// New creates a new nvcdi library
func New(opts ...Option) (Interface, error) {
l := &nvcdilib{
disabledHooks: make(disabledHooks),
disabledHooks: make(map[HookName]bool),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disabledHooks: make(map[HookName]bool),
disabledHooks: make(disabledHooks),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants