Skip to content

raw_exec windows: add support for setting the task user #25641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tomqwpl
Copy link

@tomqwpl tomqwpl commented Apr 10, 2025

Description

This expands the scope of what was done in #25496 to make it more generally applicable.

Note that this is currently a draft. I deliberately haven't done anything about docs or changelog, the purpose
is to show what's possible and to discuss inclusion.

Right now, it would conflict with the #25496 changes. With that code you would use. e.g "NT AUTHORITY\LocalSystem" as the username. That doesn't work here. I could easily detect explicit use of "NT AUTHORITY\something" and revert to the original code. Discussion would be needed with @x-dvr on that and what use cases need to work there.

The use here is to either use "domain\username" or "username@domain". The latter is the more modern form of Windows username. It is also possible just to use "username" for a local user, or ".\username".

I'd like to include some logging, but so far I can see any logging actually come out in the nomad log output. Not sure why.

Testing & Reproduction steps

In order to test this it's necessary to run Nomad as the LocalSystem account. Easiest method of doing that for development purposes is to use "sysinternals" psexec : https://learn.microsoft.com/en-us/sysinternals/downloads/psexec. With that, run "psexec -s cmd.exe". That gets you a command prompt running as LocalSystem from where you can run nomad. Alternatively just run "psexec -s bin\nomad agent -dev".

Because this has to be run as local system, not sure how you would run automated tests on it. Equally there's not much you can do to unit test the individual parts I don't think.

I've separated the code that does the specific "s4u" stuff into a separate package, but happy to do whatever is thought best there. I didn't want it cluttering up the executor_windows.go file.

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@tomqwpl tomqwpl requested review from a team as code owners April 10, 2025 11:32
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tomqwpl!

I haven't reviewed very closely yet, but just a couple of high-level notes...

First, are you able/willing to sign the CLA? We can't accept a non-trivial PR like this without it, but I want to save everyone some time if that's going to be an issue.

Second, what kind of testing can be reasonably done with this on GitHub Actions? We sort of let the original PR slide on expanding test coverage and hit this silly java-driver-on-Windows bug #25648 so I'd really like to have some way of making sure this is going to be a maintainable change.

Right now, it would conflict with the #25496 changes. With that code you would use. e.g "NT AUTHORITY\LocalSystem" as the username. That doesn't work here. I could easily detect explicit use of "NT AUTHORITY\something" and revert to the original code. Discussion would be needed with @x-dvr on that and what use cases need to work there.

Unfortunately we've already shipped #25496, so we can't break the behavior that allows those names. Are those names improper in some way, or do they just not work with the behavior demonstrated here.

I'd like to include some logging, but so far I can see any logging actually come out in the nomad log output. Not sure why.

Right. The executor doesn't log to the client agent, it logs to the executor log which you can find at $datadir/alloc/$alloc_id/$task_name/executor.out (you can find this path in the agent logs that start with "client.driver_mgr.raw_exec.executor: starting plugin". That's designed to expose the logs of the task executor to the job author, as opposed to only exposing them in the agent logs which many job authors won't have access to.

@tomqwpl
Copy link
Author

tomqwpl commented Apr 10, 2025

First, are you able/willing to sign the CLA? We can't accept a non-trivial PR like this without it, but I want to save everyone some time if that's going to be an issue.

The CLA I'm trying to clear now. I'm not expecting it to be a problem, but can't say for sure yet. I'd forgotten I would need to do that, I should have done it before I created the PR. It'll probably be a week before I have any clarity on that.

Right now, it would conflict with the #25496 changes. With that code you would use. e.g "NT AUTHORITY\LocalSystem" as the username. That doesn't work here. I could easily detect explicit use of "NT AUTHORITY\something" and revert to the original code. Discussion would be needed with @x-dvr on that and what use cases need to work there.

Unfortunately we've already shipped #25496, so we can't break the behavior that allows those names. Are those names improper in some way, or do they just not work with the behavior demonstrated here.

I deliberately didn't include handling of NT_AUTHORITY so as to have a discussion on the best way to maintain compatibility there (hence my comment). The code was also meant to show what I'd have done had the original not existed and then leave how to deal with the merge as a separate discussion. "NT AUTHORITY" is a weird thing. LSALogonUser doesn't accept it, but "LogonUser" does. It's kind of a placeholder, though I don't know why ".\LocalSystem" wouldn't have been sufficient (it doesn't work with this code though, I tried it). The only way I know you could deal with it is just to recognise it manually. I didn't do that as it felt fragile. I'm not actually sure what the range of users the original code would accept is, whether recognising "NT AUTHORITY" is enough. I was hoping perhaps the original PR author could help out there.

Second, what kind of testing can be reasonably done with this on GitHub Actions?
Really don't know at this point. I will try and find out what's possible with GitHub actions. You'd need to be able to run Nomad as "LocalSystem" and you'd need there to be a local machine account and possibly even a domain account as well that you can then use to test against. My guess is you're not going to have that kind of ability. From the comment referenced in the other PR, it might be possible for self hosted runners, but that's going to be way beyond what I'm going to be able to investigate. Self hosted runners with specific requirements on user accounts would presumably severely limit the ability of anyone to fork and test a PR wouldn't it, since they would also need to have the same runners set up?

Can you point at where the testing for this on Linux would be? I've looked around and can't find anything obvious. It would be useful to know what level would be appropriate to do this at, assuming it can be done.

With specific reference to that issue though, I guess this is where someone would need to explain the potential impact. The name of the PR relates to "raw-exec", as did the original, but clearly the code change actually has a wider impact than that, so clearly tests would need to cover more than just "raw-exec".

@tgross
Copy link
Member

tgross commented Apr 10, 2025

The CLA I'm trying to clear now. I'm not expecting it to be a problem, but can't say for sure yet. I'd forgotten I would need to do that, I should have done it before I created the PR. It'll probably be a week before I have any clarity on that.

Cool, thanks. I know it's a pain, so I appreciate it.

I'm not actually sure what the range of users the original code would accept is, whether recognising "NT AUTHORITY" is enough. I was hoping perhaps the original PR author could help out there.

Ok, understood. I suspect we're going to want to support the other NT_* groups of SIDs like NT_SERVICE here, so yeah I'd definitely be interested in hearing @x-dvr's thoughts.

Can you point at where the testing for this on Linux would be? I've looked around and can't find anything obvious. It would be useful to know what level would be appropriate to do this at, assuming it can be done.

There are test files in https://github.com/hashicorp/nomad/tree/main/drivers/shared/executor but I guess I was thinking more in terms of what we might need to do in the test environment setup in test-windows.yml. Because the reality is that Unix users are really simple and easy for us to test the behaviors, whereas Windows users are not (as evidenced by all the code you've just contributed!). So all this new code is making syscalls and the only way we'd have to test that at all right now is running jobs in our E2E suite on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants