Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POA-2928] Fixes discovered during testing #80

Merged
merged 16 commits into from
Feb 19, 2025

Conversation

mudit-postman
Copy link
Contributor

No description provided.

Boomtokn
Boomtokn previously approved these changes Feb 17, 2025
- telemetry api erroring our due to nil resp var
- already deleted pod is not getting removed during pod health check
- telemetry api erroring our due to nil resp var
- already deleted pod is not getting removed during pod health check
…bs/postman-insights-agent into mudit/poa-2609-testing-changes
@mudit-postman mudit-postman marked this pull request as ready for review February 18, 2025 14:03
Copy link
Contributor

@mgritter mgritter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but I am uncertain of the resourceVersion changes.

Comment on lines 146 to 156
for {
select {
case received := <-sig:
printer.Stderr.Infof("Received %v, stopping daemonset...\n", received.String())
break DoneWaitingForSignal
}
for received := range sig {
printer.Stderr.Infof("Received %v, stopping daemonset...\n", received.String())
break DoneWaitingForSignal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's going on here, but if you don't have the nested for/select then you don't need the named break

}

if podArgs.PodTrafficMonitorState == PodDetected || podArgs.PodTrafficMonitorState == PodInitialized {
printer.Debugf("Apidump process not started for pod %s during it's initialization, starting now\n", podArgs.PodName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printer.Debugf("Apidump process not started for pod %s during it's initialization, starting now\n", podArgs.PodName)
printer.Debugf("Apidump process not started for pod %s during its initialization, starting now\n", podArgs.PodName)

Comment on lines 89 to 92
watcher, err := kc.Clientset.CoreV1().Events("").Watch(context.Background(), metaV1.ListOptions{
Watch: true,
FieldSelector: "involvedObject.kind=Pod",
Watch: true,
FieldSelector: "involvedObject.kind=Pod",
ResourceVersion: pod.ResourceVersion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Kubernetes docs it sounds like the intended use is to perform a List and then watch with the resourceVersion returned from that List. So you did that with the pod itself; but isn't the more natural thing to use the resourceVersion associated with the list operation we do over all pods, on startup? Then we wouldn't have to worry about the semantics being different in some unexpected way.

I am uncomfortable with the assumption that this pod's resourceVersion represents a point in time that serves as an anchor. I'm not convinced that is supported by the documentation, but I admit I don't understand how the Pod and list versions differ.

Should we be using a ListMeta field? Can we even get that via the ClientSet?

v1.meta/ListMeta - The metadata.resourceVersion of a resource collection (the response to a list) identifies the resource version at which the collection was constructed.

Another things that came up is that it doesn't look like we have any facilities for restart. Is that all handled internally by the Watch method?

If a client watch is disconnected then that client can start a new watch from the last returned resourceVersion; the client could also perform a fresh get / list request and begin again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncomfortable with the assumption that this pod's resourceVersion represents a point in time that serves as an anchor.

I guess the variable name pod is confusing (will change it to pods), I am using the resourceVersion returned from the listObject, the func I am using is .List() not .Get()

Another things that came up is that it doesn't look like we have any facilities for restart. Is that all handled internally by the Watch method?

They have some retry mechanism internally (ref), but not sure how reliable it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay we have this for retry, can add as a follow-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using retryWatcher now with this commit 1eb9960

printer.Infof("Event data: %s\n", string(jsonData))
}
default:
printer.Infof("Unhandled event type: %v\n", event.Type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be very high-rate, should it be Debug? Or throttled in some other way? Or eliminate common cases that we know occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test file, so Infof should be okay.

@mudit-postman mudit-postman merged commit dcb0bab into mudit/poa-2609 Feb 19, 2025
2 checks passed
@mudit-postman mudit-postman deleted the mudit/poa-2609-testing-changes branch February 19, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants