[TIKI-104] feat: attribute removal #52

craigfurman · 2023-04-26T15:03:10Z

Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.

craigfurman

@srlk @tommyknows, I'm leaving this in draft while we talk through the threads (and any more you raise). @mmols-snyk might also be interested in the proposed interface here.

craigfurman · 2023-04-26T15:08:00Z

README.md

+These paths are dot-separated address for nested values, in the same format as
+arguments to `kubectl explain`. For example, the expression
+"spec.containers.env" will cause Kubernetes Pod container environment variables
+to be redacted. "containers" is an array, and implicitly each element of this
+array is redacted.


Before hand-rolling an implementation, I explored jsonpath, jmespath, and jsonpatch libraries. The first two do not have spec (or implementation) support for subtree removals, and in practice any implementation that made use of jmespath syntax would have needed to implement the subset we choose to support - since we'll have to locate the node in the tree ourselves. I think this effectively rules them out, in favour of a more explicitly restricted, and therefore simple, syntax.

jsonpatch was looking good, but unfortunately does not support our first use case: removing subtrees under all elements of al array, e.g. "redact all pod environment variables, from all containers" (pod.spec.containers.env, in the kubectl explain-ish syntax I ended up choosing). This is proposed here: json-patch/json-patch2#18.

We could possibly use jsonpatch if we require users to specify exact json addresses to redact, e.g. individual environment variables (staying on that use-case).

WDYT?

note to self: grammar error on line 80. Will fix if we decide to keep this syntax.

jsonpath, jmespath, and jsonpatch libraries

Yeah, I have also done some look around long ago, they all have shortcomings

It seems to me simple handrolled implementation will be good enough
If we want to extend more, maybe we can take a look at jq in future: https://github.com/itchyny/gojq

craigfurman · 2023-04-26T15:10:30Z

internal/kubeobjects/redaction.go

+	return redact(obj, exprParts)
+}
+
+func redact(obj interface{}, exprParts []string) interface{} {


I am not too attached to this hand-rolled implementation. I think the tests do give us some confidence, but the real reason I wrote it is the failure to find 3rd party libraries that work for us (see above). After I'd written it, @srlk pointed me to https://github.com/itchyny/gojq. It does look reasonably battle-tested, and I'm not against replacing this implementation (and redaction expression syntax) with it.

As well as the option in the other thread, to use jsonpatch but require users to be precise about exact fields to clip out, without being able to express concepts like "all pod spec env vars", I think gojq is option 3 (option 1 being "keep the code as written").

I think there's a much simpler way to do this - unstructured.Unstructured already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField

(Haven't tested it, but as far as I can tell, you could just do:

unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)

Also - no offense - but I much prefer the implementation of RemoveNestedField:

func RemoveNestedField(obj map[string]interface{}, fields ...string) { m := obj for _, field := range fields[:len(fields)-1] { if x, ok := m[field].(map[string]interface{}); ok { m = x } else { return } } delete(m, fields[len(fields)-1]) }

Oh wow, I really wish I had seen that! Does it support removing object fields nested inside array elements? Looks like it doesn't...

Oh yeah, not sure...I guess not. But maybe give it a shot 😄
But you could still hack something together with the NestedSlice and SetNestedSlice functions in that case? 🤔

I'm not seeing how those functions help, not quickly at least... if this is something already in your head, could you push a branch on top of this one showing me what you mean? Take a look at the test suite first, if we agree on that, then keeping it passing should prove if it works.

AFAIK the 2 major differences between my impl and unstructured.RemoveNestedField are nested array operations (what we're discussing), and recursion vs loops (which I don't have a strong opinion on - I think I just tend to reach for recursion for things that traverse structures). Holler if there is an aspect I'm missing.

I think the bigger picture is actually in the other thread I started - should we use this kubectl explain syntax and allow ops on "all" array elements at all? If that one tips to "no", this thread becomes moot. But I really am not sure about that one yet.

should we use this kubectl explain syntax and allow ops on "all" array elements at all?

I like this for its simplicity

I doubt anyone wants to do something like spec.containers[1].blabla

Yeah sorry for the noise - I just had a look at your full implementation and it's nice! :)

Regarding all array elements: yes I'd say so. Explicit indexes would be a bit weird, as Serol noted.

I don't see why we shouldn't keep this 👍

@srlk I haven't fully validated this but with jmespath specifically it looks like we might be able to index by field, e.g. spec.containers[name=my-app].env. But again, not "all", which might be our modal use case (a bit of a guess)

The issue with jmespath remains though, that we'd effectively have to write our own parser for it to "prune" trees in the way that we want to, since it's not part of the spec or the go impl.

tommyknows

Publishing this review already because of the potential implication of my comment about the unstructured package - I'm definitely not done yet :)

tommyknows · 2023-04-26T15:17:22Z

internal/controller/controller.go

+		return obj
+	}
+
+	logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind))


Not sure why we're adding the kind to that log-statement - it should already be in the context of that logger (added at the beginning of the Reconcile function.

tommyknows · 2023-04-26T15:19:54Z

internal/kubeobjects/redaction.go

+	return redact(obj, exprParts)
+}
+
+func redact(obj interface{}, exprParts []string) interface{} {


I think there's a much simpler way to do this - unstructured.Unstructured already supports this:
https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#RemoveNestedField

(Haven't tested it, but as far as I can tell, you could just do:

unstructured.RemoveNestedField(obj.Object, strings.Split(expr, ".")...)

mmols-snyk · 2023-04-27T04:10:21Z

README.md

+      - daemonsets
+      - deployments
+      - statefulsets
+    redactions:


I've gone back and forth about whether the "redaction" name fits our approach here, or if it implies we are only obscuring a key's value, but leaving the key (and structure) in place so that the system still understands how a resource is configured. It seems like the term could encompasses both ideas, just strictly looking at the dictionary definition.

I propose a more explicit attributeRedactions or configRedactions naming so that it's clear that we aren't excluding entire resources here (only at most large swaths of the config themselves).

Opening this thread to collect opinions.

or if it implies

tbh, as a non-native speaker, I had to check the dictionary what redaction means 😄
Probably attributeFilters is easier to digest

but leaving the key (and structure) in place

This could get complex by time, imagine this manifest and our filter/reduction was a.b:

{ "a": { "b": [{ "c": {"d": 1} }, { "c": {"d": 2} } ], "e": "3" } }

and if we are erasing the fields that match the filter, then we would have

{ "a": { "e": "3" } }

but if we are redacting those fields, what are we going to return for children objects of the redacted field? And how useful would that be?

{ "a": { "b": ????? "e": "3" } }

@mmols-snyk attributeFilters sounds good to me.

@srlk we are deleting the whole subtree addressed by that field (as in your first example), so "redaction" is arguably a bad name anyway!

Maybe we should go for attributeRemovals now that I look again, "filters" almost sounds like we're selecting based on its presence...

Changed it to this in another commit, will squash when we're all happy.

I've kept the word "redact" in a few instances in comments and variable names where it actually does make sense: we redact an object by removing some of its attributes, but we remove the attributes themselves.

Let me know what you think.

mmols-snyk · 2023-04-27T04:20:11Z

README.md

+      - daemonsets
+      - deployments
+      - statefulsets
+    redactions:


There's no specific use case to highlight here just yet, so feel free to discard this feedback, but I could envision more options here in the future that live outside the path syntax itself. Should we start with this as an array of objects - potentially with a path key for each, rather than just strings?

The alternative is assuming we'd support a mix of strings and object configurations here in the future (there was some objection to this shared in another thread), or I guess we'd introduce some other configuration that superseded this one.

Interesting, what specifically do you envision? I hadn't thought that we might want to scope by jsonpath-ish address as opposed to scoping by GVK.

I'm open to ideas but if we're not sure, I'd say probably YAGNI for now rather than introduce complexity to the config.

srlk · 2023-04-27T06:29:39Z

internal/controller/controller.go

+	logger.Info(fmt.Sprintf("redacting resource %s", r.gvk.Kind))
+	objJson, err := json.Marshal(obj)
+	if err != nil {
+		logger.Error(err, "marshalling json")


why do we continue if there's an error?
I would expect to return return obj, err in case of an error, maybe also log it in the calling function.

I wanted to note that we should probably use k8s.io/apimachinery/pkg/runtime.DefaultUnstructuredConverter.ToUnstructured(obj), instead of doing the conversion ourselves.

However, we already create an unstructured object. If you change the return type of (*reconciler).newObject from client.Object to *unstructured.Unstructured, you can just get that 😄

The underlying map[string]interface{} is in unstructured.Unstructured.Object :)

(you'll probably need to adjust the Redact function to also take and return a map[string]interface{}, but that should be fairly simple to achieve :)

@srlk yes you are right, we've gone back and forth on this issue quite a bit and I had forgotten that we did decide to bail on error rather than send data we couldn't redact. Will change.

Will address Ramon's note too.

@srlk I addressed your noted in a commit that I just pushed.

@tommyknows I explored yours, and unfortunately ToUnstructured only operates on struct pointers. It returns an error when passed anything other than that - even maps, which are themselves pointers. I'm not sure what the issue is with the current code, that unmarshals into an unstructured?

Again, we don't even need ToUnstructured - we already have an unstructured element!

We can pair on this a bit later if you'd like, probably easier :)

Narrator: @tommyknows contributed the commit, it was good, we squashed it.

srlk · 2023-04-27T14:43:34Z

internal/controller/controller.go

@@ -220,6 +224,7 @@ func (r *reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Resu
 		reqLogger := logger.WithValues("organization_id", orgID, "request_id", requestID)
 		ctx = log.IntoContext(ctx, reqLogger)

+		r.removeConfiguredAttributes(reqLogger, obj)


not necessary to pass logger to this function, it just logs an info message

Do you mean that maybe we should just log above it to avoid passing the param? The trouble with that is we'd have to duplicate the guard clause that returns early if no removals are configured. I think passing the logger as a parameter is a small price to pay for that! I guess it'd be more idiomatic to pass the context instead and retrieve it from there?

I think passing the logger in is fine... :)

just a nit (I guess the only thing I miss in java is not passing loggers around)

logger.Info("removing configured resource attributes") r.removeConfiguredAttributes(obj)

But see my first reply for why I don't want to do that - it'd remove the distinction between redacted and unredacted types in the logs.

I know what you mean about passing a lot of cross-cutting parameters. The idiomatic go equivalent is to put and get values in a context, as we do in some other places IIRC... I'll actually change it to that just to set an example 😂

srlk

thanks!

craigfurman · 2023-04-27T15:24:15Z

@mmols-snyk holler if you want to contribute to the threads on naming and config format in general, otherwise I'll merge this tomorrow. We can always change it after that anyway, but this might be a good time.

Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.

snyk-deployer · 2023-04-28T08:21:32Z

🎉 This PR is included in version 0.20.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

craigfurman commented Apr 26, 2023

View reviewed changes

tommyknows reviewed Apr 26, 2023

View reviewed changes

craigfurman force-pushed the attr-redaction branch from c7cb5c9 to 922265c Compare April 26, 2023 15:44

mmols-snyk reviewed Apr 27, 2023

View reviewed changes

srlk reviewed Apr 27, 2023

View reviewed changes

craigfurman force-pushed the attr-redaction branch from 91a43a5 to 0cbf97c Compare April 27, 2023 10:40

craigfurman marked this pull request as ready for review April 27, 2023 10:41

craigfurman requested a review from a team as a code owner April 27, 2023 10:41

craigfurman requested a review from tommyknows April 27, 2023 10:41

srlk reviewed Apr 27, 2023

View reviewed changes

srlk approved these changes Apr 27, 2023

View reviewed changes

craigfurman force-pushed the attr-redaction branch from 756d6d2 to f0afb5a Compare April 27, 2023 15:22

tommyknows approved these changes Apr 27, 2023

View reviewed changes

craigfurman changed the title ~~[TIKI-104] feat: attribute redaction~~ [TIKI-104] feat: attribute removal Apr 27, 2023

feat: attribute redaction

cf3bb96

Users can optionally configure a list of paths alongside scan type config, representing fields to remove from manifests before sending them to Snyk.

craigfurman force-pushed the attr-redaction branch from f0afb5a to cf3bb96 Compare April 28, 2023 08:17

craigfurman enabled auto-merge April 28, 2023 08:17

craigfurman merged commit dc6c303 into main Apr 28, 2023

craigfurman deleted the attr-redaction branch April 28, 2023 08:19

snyk-deployer added the released label Apr 28, 2023

[TIKI-104] feat: attribute removal #52

[TIKI-104] feat: attribute removal #52

Uh oh!

Conversation

craigfurman commented Apr 26, 2023

Uh oh!

craigfurman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srlk Apr 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tommyknows left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tommyknows Apr 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Apr 26, 2023 •

edited

Loading

srlk Apr 27, 2023 •

edited

Loading

craigfurman Apr 26, 2023 •

edited

Loading

craigfurman Apr 26, 2023 •

edited

Loading

tommyknows Apr 27, 2023 •

edited

Loading

craigfurman Apr 28, 2023 •

edited

Loading