-
Notifications
You must be signed in to change notification settings - Fork 3
[TIKI-104] feat: attribute removal #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srlk @tommyknows, I'm leaving this in draft while we talk through the threads (and any more you raise). @mmols-snyk might also be interested in the proposed interface here.
README.md
Outdated
These paths are dot-separated address for nested values, in the same format as | ||
arguments to `kubectl explain`. For example, the expression | ||
"spec.containers.env" will cause Kubernetes Pod container environment variables | ||
to be redacted. "containers" is an array, and implicitly each element of this | ||
array is redacted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before hand-rolling an implementation, I explored jsonpath, jmespath, and jsonpatch libraries. The first two do not have spec (or implementation) support for subtree removals, and in practice any implementation that made use of jmespath syntax would have needed to implement the subset we choose to support - since we'll have to locate the node in the tree ourselves. I think this effectively rules them out, in favour of a more explicitly restricted, and therefore simple, syntax.
jsonpatch was looking good, but unfortunately does not support our first use case: removing subtrees under all elements of al array, e.g. "redact all pod environment variables, from all containers" (pod.spec.containers.env, in the kubectl explain
-ish syntax I ended up choosing). This is proposed here: json-patch/json-patch2#18.
We could possibly use jsonpatch if we require users to specify exact json addresses to redact, e.g. individual environment variables (staying on that use-case).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: grammar error on line 80. Will fix if we decide to keep this syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsonpath, jmespath, and jsonpatch libraries
Yeah, I have also done some look around long ago, they all have shortcomings
It seems to me simple handrolled implementation will be good enough
If we want to extend more, maybe we can take a look at jq
in future: https://github.com/itchyny/gojq
internal/kubeobjects/redaction.go
Outdated
return redact(obj, exprParts) | ||
} | ||
|
||
func redact(obj interface{}, exprParts []string) interface{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too attached to this hand-rolled implementation. I think the tests do give us some confidence, but the real reason I wrote it is the failure to find 3rd party libraries that work for us (see above). After I'd written it, @srlk pointed me to https://github.com/itchyny/gojq. It does look reasonably battle-tested, and I'm not against replacing this implementation (and redaction expression syntax) with it.
As well as the option in the other thread, to use jsonpatch but require users to be precise about exact fields to clip out, without being able to express concepts like "all pod spec env vars", I think gojq is option 3 (option 1 being "keep the code as written").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a much simpler way to do this - unstructured.Unstructured
already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField
(Haven't tested it, but as far as I can tell, you could just do:
unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - no offense - but I much prefer the implementation of RemoveNestedField
:
func RemoveNestedField(obj map[string]interface{}, fields ...string) {
m := obj
for _, field := range fields[:len(fields)-1] {
if x, ok := m[field].(map[string]interface{}); ok {
m = x
} else {
return
}
}
delete(m, fields[len(fields)-1])
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, I really wish I had seen that! Does it support removing object fields nested inside array elements? Looks like it doesn't...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, not sure...I guess not. But maybe give it a shot 😄
But you could still hack something together with the NestedSlice
and SetNestedSlice
functions in that case? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing how those functions help, not quickly at least... if this is something already in your head, could you push a branch on top of this one showing me what you mean? Take a look at the test suite first, if we agree on that, then keeping it passing should prove if it works.
AFAIK the 2 major differences between my impl and unstructured.RemoveNestedField are nested array operations (what we're discussing), and recursion vs loops (which I don't have a strong opinion on - I think I just tend to reach for recursion for things that traverse structures). Holler if there is an aspect I'm missing.
I think the bigger picture is actually in the other thread I started - should we use this kubectl explain
syntax and allow ops on "all" array elements at all? If that one tips to "no", this thread becomes moot. But I really am not sure about that one yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use this kubectl explain syntax and allow ops on "all" array elements at all?
I like this for its simplicity
I doubt anyone wants to do something like spec.containers[1].blabla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry for the noise - I just had a look at your full implementation and it's nice! :)
Regarding all array elements: yes I'd say so. Explicit indexes would be a bit weird, as Serol noted.
I don't see why we shouldn't keep this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srlk I haven't fully validated this but with jmespath specifically it looks like we might be able to index by field, e.g. spec.containers[name=my-app].env
. But again, not "all", which might be our modal use case (a bit of a guess)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with jmespath remains though, that we'd effectively have to write our own parser for it to "prune" trees in the way that we want to, since it's not part of the spec or the go impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Publishing this review already because of the potential implication of my comment about the unstructured
package - I'm definitely not done yet :)
internal/controller/controller.go
Outdated
return obj | ||
} | ||
|
||
logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we're adding the kind to that log-statement - it should already be in the context of that logger (added at the beginning of the Reconcile
function.
internal/kubeobjects/redaction.go
Outdated
return redact(obj, exprParts) | ||
} | ||
|
||
func redact(obj interface{}, exprParts []string) interface{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a much simpler way to do this - unstructured.Unstructured
already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField
(Haven't tested it, but as far as I can tell, you could just do:
unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)
c7cb5c9
to
922265c
Compare
README.md
Outdated
- daemonsets | ||
- deployments | ||
- statefulsets | ||
redactions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone back and forth about whether the "redaction" name fits our approach here, or if it implies we are only obscuring a key's value, but leaving the key (and structure) in place so that the system still understands how a resource is configured. It seems like the term could encompasses both ideas, just strictly looking at the dictionary definition.
I propose a more explicit attributeRedactions
or configRedactions
naming so that it's clear that we aren't excluding entire resources here (only at most large swaths of the config themselves).
Opening this thread to collect opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or if it implies
tbh, as a non-native speaker, I had to check the dictionary what redaction
means 😄
Probably attributeFilters
is easier to digest
but leaving the key (and structure) in place
This could get complex by time, imagine this manifest and our filter/reduction was a.b
:
{
"a": {
"b": [{
"c": {"d": 1}
},
{
"c": {"d": 2}
}
],
"e": "3"
}
}
and if we are erasing the fields that match the filter, then we would have
{
"a": {
"e": "3"
}
}
but if we are redacting those fields, what are we going to return for children objects of the redacted field? And how useful would that be?
{
"a": {
"b": ?????
"e": "3"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmols-snyk attributeFilters
sounds good to me.
@srlk we are deleting the whole subtree addressed by that field (as in your first example), so "redaction" is arguably a bad name anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should go for attributeRemovals
now that I look again, "filters" almost sounds like we're selecting based on its presence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to this in another commit, will squash when we're all happy.
I've kept the word "redact" in a few instances in comments and variable names where it actually does make sense: we redact an object by removing some of its attributes, but we remove the attributes themselves.
Let me know what you think.
README.md
Outdated
- daemonsets | ||
- deployments | ||
- statefulsets | ||
redactions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no specific use case to highlight here just yet, so feel free to discard this feedback, but I could envision more options here in the future that live outside the path syntax itself. Should we start with this as an array of objects - potentially with a path
key for each, rather than just strings?
The alternative is assuming we'd support a mix of strings and object configurations here in the future (there was some objection to this shared in another thread), or I guess we'd introduce some other configuration that superseded this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, what specifically do you envision? I hadn't thought that we might want to scope by jsonpath-ish address as opposed to scoping by GVK.
I'm open to ideas but if we're not sure, I'd say probably YAGNI for now rather than introduce complexity to the config.
internal/controller/controller.go
Outdated
logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind)) | ||
objJson, err := json.Marshal(obj) | ||
if err != nil { | ||
logger.Error(err, "marshalling json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we continue if there's an error?
I would expect to return return obj, err
in case of an error, maybe also log it in the calling function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to note that we should probably use k8s.io/apimachinery/pkg/runtime.DefaultUnstructuredConverter.ToUnstructured(obj)
, instead of doing the conversion ourselves.
However, we already create an unstructured object. If you change the return type of (*reconciler).newObject
from client.Object
to *unstructured.Unstructured
, you can just get that 😄
The underlying map[string]interface{}
is in unstructured.Unstructured.Object
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you'll probably need to adjust the Redact
function to also take and return a map[string]interface{}
, but that should be fairly simple to achieve :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srlk yes you are right, we've gone back and forth on this issue quite a bit and I had forgotten that we did decide to bail on error rather than send data we couldn't redact. Will change.
Will address Ramon's note too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srlk I addressed your noted in a commit that I just pushed.
@tommyknows I explored yours, and unfortunately ToUnstructured only operates on struct pointers. It returns an error when passed anything other than that - even maps, which are themselves pointers. I'm not sure what the issue is with the current code, that unmarshals into an unstructured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we don't even need ToUnstructured
- we already have an unstructured element!
We can pair on this a bit later if you'd like, probably easier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrator: @tommyknows contributed the commit, it was good, we squashed it.
91a43a5
to
0cbf97c
Compare
internal/controller/controller.go
Outdated
@@ -220,6 +224,7 @@ func (r *reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Resu | |||
reqLogger := logger.WithValues("organization_id", orgID, "request_id", requestID) | |||
ctx = log.IntoContext(ctx, reqLogger) | |||
|
|||
r.removeConfiguredAttributes(reqLogger, obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary to pass logger to this function, it just logs an info message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that maybe we should just log above it to avoid passing the param? The trouble with that is we'd have to duplicate the guard clause that returns early if no removals are configured. I think passing the logger as a parameter is a small price to pay for that! I guess it'd be more idiomatic to pass the context instead and retrieve it from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing the logger in is fine... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit (I guess the only thing I miss in java is not passing loggers around)
logger.Info("removing configured resource attributes")
r.removeConfiguredAttributes(obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But see my first reply for why I don't want to do that - it'd remove the distinction between redacted and unredacted types in the logs.
I know what you mean about passing a lot of cross-cutting parameters. The idiomatic go equivalent is to put and get values in a context, as we do in some other places IIRC... I'll actually change it to that just to set an example 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
756d6d2
to
f0afb5a
Compare
@mmols-snyk holler if you want to contribute to the threads on naming and config format in general, otherwise I'll merge this tomorrow. We can always change it after that anyway, but this might be a good time. |
Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.
f0afb5a
to
cf3bb96
Compare
🎉 This PR is included in version 0.20.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.