Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: kapinger use service IP instead of name #1283

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

matmerr
Copy link
Member

@matmerr matmerr commented Jan 29, 2025

Description

CoreDNS wasn't playing nice with massive bursts of service name fqdn lookups, this moves to using the ip of the service(s) directly

Please provide a brief description of the changes made in this pull request.

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@matmerr matmerr requested a review from a team as a code owner January 29, 2025 18:58
@MikeZappa87
Copy link

What was the issue with CoreDNS? Have we informed the correct teams of a possible issue with CoreDNS?

@@ -5,6 +5,7 @@ import (
"fmt"
"log"
"net/http"
_ "net/http/pprof"
Copy link
Contributor

@huntergregory huntergregory Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why import this? Comment for future users?

Copy link
Member

@timraymond timraymond Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huntergregory that's fairly common with pprof, but also not my most favorite thing. It uses init() to attach pprof handlers to the default net/http handler. Probably fine here, where it's for testing, but for production pprof handlers I usually like to explicitly attach those handlers to my routes. This is documented in the package docs here: https://pkg.go.dev/runtime/pprof .

@@ -3,10 +3,12 @@ module github.com/microsoft/retina/hack/tools/kapinger
go 1.22.5

require (
golang.org/x/exp v0.0.0-20250128182459-e0ece0dbea4c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: I'm not a fan of the experimental library because it makes breaking changes all the time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that repo:
Warning: Packages here are experimental and unreliable. Some may one day be promoted to the main repository or other subrepository, or they may be modified arbitrarily or even disappear altogether.

We probably should make attempts to just move away from this entirely.

@@ -41,7 +41,7 @@ func NewKapingerHTTPClient(clientset *kubernetes.Clientset, labelselector string
Transport: &http.Transport{
DisableKeepAlives: true,
},
Timeout: 3 * time.Second,
Timeout: 15 * time.Second,
Copy link
Contributor

@huntergregory huntergregory Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there's a networkpolicy dropping traffic in our testing? This change would reduce that amount of traffic by 80%?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps make this timeout configurable?

for i := 0; i < k.volume; i++ {
for _, url := range k.urls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we killing the TARGET_TYPE=pod functionality? I'd remove any dead code or add comments to it to fix later

switch k.targettype {
case Service:
k.urls, err = k.getServiceURLs()
if err != nil {
return nil, fmt.Errorf("error getting service URLs: %w", err)
}
case Pod:
k.urls, err = k.getPodURLs()
default:
return nil, fmt.Errorf("env TARGET_TYPE must be \"service\" or \"pod\"")
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants