Skip to content

[TIKI-104] feat: attribute removal #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 28, 2023
Merged

[TIKI-104] feat: attribute removal #52

merged 1 commit into from
Apr 28, 2023

Conversation

craigfurman
Copy link

Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.

Copy link
Author

@craigfurman craigfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srlk @tommyknows, I'm leaving this in draft while we talk through the threads (and any more you raise). @mmols-snyk might also be interested in the proposed interface here.

README.md Outdated
Comment on lines 80 to 84
These paths are dot-separated address for nested values, in the same format as
arguments to `kubectl explain`. For example, the expression
"spec.containers.env" will cause Kubernetes Pod container environment variables
to be redacted. "containers" is an array, and implicitly each element of this
array is redacted.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before hand-rolling an implementation, I explored jsonpath, jmespath, and jsonpatch libraries. The first two do not have spec (or implementation) support for subtree removals, and in practice any implementation that made use of jmespath syntax would have needed to implement the subset we choose to support - since we'll have to locate the node in the tree ourselves. I think this effectively rules them out, in favour of a more explicitly restricted, and therefore simple, syntax.

jsonpatch was looking good, but unfortunately does not support our first use case: removing subtrees under all elements of al array, e.g. "redact all pod environment variables, from all containers" (pod.spec.containers.env, in the kubectl explain-ish syntax I ended up choosing). This is proposed here: json-patch/json-patch2#18.

We could possibly use jsonpatch if we require users to specify exact json addresses to redact, e.g. individual environment variables (staying on that use-case).

WDYT?

Copy link
Author

@craigfurman craigfurman Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: grammar error on line 80. Will fix if we decide to keep this syntax.

Copy link
Contributor

@srlk srlk Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonpath, jmespath, and jsonpatch libraries

Yeah, I have also done some look around long ago, they all have shortcomings

It seems to me simple handrolled implementation will be good enough
If we want to extend more, maybe we can take a look at jq in future: https://github.com/itchyny/gojq

return redact(obj, exprParts)
}

func redact(obj interface{}, exprParts []string) interface{} {
Copy link
Author

@craigfurman craigfurman Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too attached to this hand-rolled implementation. I think the tests do give us some confidence, but the real reason I wrote it is the failure to find 3rd party libraries that work for us (see above). After I'd written it, @srlk pointed me to https://github.com/itchyny/gojq. It does look reasonably battle-tested, and I'm not against replacing this implementation (and redaction expression syntax) with it.

As well as the option in the other thread, to use jsonpatch but require users to be precise about exact fields to clip out, without being able to express concepts like "all pod spec env vars", I think gojq is option 3 (option 1 being "keep the code as written").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a much simpler way to do this - unstructured.Unstructured already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField

(Haven't tested it, but as far as I can tell, you could just do:

unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - no offense - but I much prefer the implementation of RemoveNestedField:

func RemoveNestedField(obj map[string]interface{}, fields ...string) {
	m := obj
	for _, field := range fields[:len(fields)-1] {
		if x, ok := m[field].(map[string]interface{}); ok {
			m = x
		} else {
			return
		}
	}
	delete(m, fields[len(fields)-1])
}

Copy link
Author

@craigfurman craigfurman Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I really wish I had seen that! Does it support removing object fields nested inside array elements? Looks like it doesn't...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, not sure...I guess not. But maybe give it a shot 😄
But you could still hack something together with the NestedSlice and SetNestedSlice functions in that case? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how those functions help, not quickly at least... if this is something already in your head, could you push a branch on top of this one showing me what you mean? Take a look at the test suite first, if we agree on that, then keeping it passing should prove if it works.

AFAIK the 2 major differences between my impl and unstructured.RemoveNestedField are nested array operations (what we're discussing), and recursion vs loops (which I don't have a strong opinion on - I think I just tend to reach for recursion for things that traverse structures). Holler if there is an aspect I'm missing.

I think the bigger picture is actually in the other thread I started - should we use this kubectl explain syntax and allow ops on "all" array elements at all? If that one tips to "no", this thread becomes moot. But I really am not sure about that one yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use this kubectl explain syntax and allow ops on "all" array elements at all?

I like this for its simplicity

I doubt anyone wants to do something like spec.containers[1].blabla

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry for the noise - I just had a look at your full implementation and it's nice! :)

Regarding all array elements: yes I'd say so. Explicit indexes would be a bit weird, as Serol noted.

I don't see why we shouldn't keep this 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srlk I haven't fully validated this but with jmespath specifically it looks like we might be able to index by field, e.g. spec.containers[name=my-app].env. But again, not "all", which might be our modal use case (a bit of a guess)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with jmespath remains though, that we'd effectively have to write our own parser for it to "prune" trees in the way that we want to, since it's not part of the spec or the go impl.

Copy link
Contributor

@tommyknows tommyknows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing this review already because of the potential implication of my comment about the unstructured package - I'm definitely not done yet :)

return obj
}

logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we're adding the kind to that log-statement - it should already be in the context of that logger (added at the beginning of the Reconcile function.

return redact(obj, exprParts)
}

func redact(obj interface{}, exprParts []string) interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a much simpler way to do this - unstructured.Unstructured already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField

(Haven't tested it, but as far as I can tell, you could just do:

unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)

README.md Outdated
- daemonsets
- deployments
- statefulsets
redactions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth about whether the "redaction" name fits our approach here, or if it implies we are only obscuring a key's value, but leaving the key (and structure) in place so that the system still understands how a resource is configured. It seems like the term could encompasses both ideas, just strictly looking at the dictionary definition.

I propose a more explicit attributeRedactions or configRedactions naming so that it's clear that we aren't excluding entire resources here (only at most large swaths of the config themselves).

Opening this thread to collect opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or if it implies

tbh, as a non-native speaker, I had to check the dictionary what redaction means 😄
Probably attributeFilters is easier to digest

but leaving the key (and structure) in place

This could get complex by time, imagine this manifest and our filter/reduction was a.b:

{
    "a": {
        "b": [{
                "c": {"d": 1}
            },
            {
                "c": {"d": 2}
            }
        ],
        "e": "3"
    }
}

and if we are erasing the fields that match the filter, then we would have

{
    "a": {
        "e": "3"
    }
}

but if we are redacting those fields, what are we going to return for children objects of the redacted field? And how useful would that be?

{
    "a": {
       "b": ?????
        "e": "3"
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmols-snyk attributeFilters sounds good to me.

@srlk we are deleting the whole subtree addressed by that field (as in your first example), so "redaction" is arguably a bad name anyway!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should go for attributeRemovals now that I look again, "filters" almost sounds like we're selecting based on its presence...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to this in another commit, will squash when we're all happy.

I've kept the word "redact" in a few instances in comments and variable names where it actually does make sense: we redact an object by removing some of its attributes, but we remove the attributes themselves.

Let me know what you think.

README.md Outdated
- daemonsets
- deployments
- statefulsets
redactions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no specific use case to highlight here just yet, so feel free to discard this feedback, but I could envision more options here in the future that live outside the path syntax itself. Should we start with this as an array of objects - potentially with a path key for each, rather than just strings?

The alternative is assuming we'd support a mix of strings and object configurations here in the future (there was some objection to this shared in another thread), or I guess we'd introduce some other configuration that superseded this one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, what specifically do you envision? I hadn't thought that we might want to scope by jsonpath-ish address as opposed to scoping by GVK.

I'm open to ideas but if we're not sure, I'd say probably YAGNI for now rather than introduce complexity to the config.

logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind))
objJson, err := json.Marshal(obj)
if err != nil {
logger.Error(err, "marshalling json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we continue if there's an error?
I would expect to return return obj, err in case of an error, maybe also log it in the calling function.

Copy link
Contributor

@tommyknows tommyknows Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to note that we should probably use k8s.io/apimachinery/pkg/runtime.DefaultUnstructuredConverter.ToUnstructured(obj), instead of doing the conversion ourselves.

However, we already create an unstructured object. If you change the return type of (*reconciler).newObject from client.Object to *unstructured.Unstructured, you can just get that 😄

The underlying map[string]interface{} is in unstructured.Unstructured.Object :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you'll probably need to adjust the Redact function to also take and return a map[string]interface{}, but that should be fairly simple to achieve :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srlk yes you are right, we've gone back and forth on this issue quite a bit and I had forgotten that we did decide to bail on error rather than send data we couldn't redact. Will change.

Will address Ramon's note too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srlk I addressed your noted in a commit that I just pushed.

@tommyknows I explored yours, and unfortunately ToUnstructured only operates on struct pointers. It returns an error when passed anything other than that - even maps, which are themselves pointers. I'm not sure what the issue is with the current code, that unmarshals into an unstructured?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we don't even need ToUnstructured - we already have an unstructured element!

We can pair on this a bit later if you'd like, probably easier :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrator: @tommyknows contributed the commit, it was good, we squashed it.

@craigfurman craigfurman marked this pull request as ready for review April 27, 2023 10:41
@craigfurman craigfurman requested a review from a team as a code owner April 27, 2023 10:41
@craigfurman craigfurman requested a review from tommyknows April 27, 2023 10:41
@@ -220,6 +224,7 @@ func (r *reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Resu
reqLogger := logger.WithValues("organization_id", orgID, "request_id", requestID)
ctx = log.IntoContext(ctx, reqLogger)

r.removeConfiguredAttributes(reqLogger, obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary to pass logger to this function, it just logs an info message

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that maybe we should just log above it to avoid passing the param? The trouble with that is we'd have to duplicate the guard clause that returns early if no removals are configured. I think passing the logger as a parameter is a small price to pay for that! I guess it'd be more idiomatic to pass the context instead and retrieve it from there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing the logger in is fine... :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit (I guess the only thing I miss in java is not passing loggers around)

logger.Info("removing configured resource attributes")
r.removeConfiguredAttributes(obj)

Copy link
Author

@craigfurman craigfurman Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But see my first reply for why I don't want to do that - it'd remove the distinction between redacted and unredacted types in the logs.

I know what you mean about passing a lot of cross-cutting parameters. The idiomatic go equivalent is to put and get values in a context, as we do in some other places IIRC... I'll actually change it to that just to set an example 😂

Copy link
Contributor

@srlk srlk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@craigfurman
Copy link
Author

@mmols-snyk holler if you want to contribute to the threads on naming and config format in general, otherwise I'll merge this tomorrow. We can always change it after that anyway, but this might be a good time.

@craigfurman craigfurman changed the title [TIKI-104] feat: attribute redaction [TIKI-104] feat: attribute removal Apr 27, 2023
Users can optionally configure a list of paths alongside scan type
config, representing fields to remove from manifests before sending them
to Snyk.
@craigfurman craigfurman enabled auto-merge April 28, 2023 08:17
@craigfurman craigfurman merged commit dc6c303 into main Apr 28, 2023
@craigfurman craigfurman deleted the attr-redaction branch April 28, 2023 08:19
@snyk-deployer
Copy link
Collaborator

🎉 This PR is included in version 0.20.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants