Skip to content

internal/delegatingresolver: avoid proxy if networktype of target address is not tcp #8215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

eshitachandwani
Copy link
Member

@eshitachandwani eshitachandwani commented Apr 2, 2025

Fixes: #8207

As mentioned in gRFC A1 , we support TCP-level proxy through the HTTPS_PROXY or the HTTP_PROXY environment variables. So if the resolved addresses for target have addresses that do not have network type of tcp , proxy needs to be avoided for such addresses.
Changes made in delegating resolver:

  • Check if no addresses in the resolver state for target address has network type tcp. If it does not , do not wait for proxy resolver update since we do not want to connect to proxy for any address.
  • If even one of the address is tcp type , we wait for proxy resolver update and do the check on per address basis, modifying the address with tcp network type to indicate use of proxy and leave the other addresses as is in the new state update.

RELEASE NOTES:

  • Proxy connections are no longer attempted for targets with non-TCP network types.

@eshitachandwani eshitachandwani added this to the 1.72 Release milestone Apr 2, 2025
@eshitachandwani eshitachandwani added the Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. label Apr 2, 2025
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

Attention: Patch coverage is 90.32258% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.12%. Comparing base (5edab9e) to head (fa78c83).
Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
.../resolver/delegatingresolver/delegatingresolver.go 90.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8215      +/-   ##
==========================================
- Coverage   82.17%   82.12%   -0.05%     
==========================================
  Files         410      417       +7     
  Lines       40236    41412    +1176     
==========================================
+ Hits        33065    34011     +946     
- Misses       5822     5970     +148     
- Partials     1349     1431      +82     
Files with missing lines Coverage Δ
internal/transport/http2_client.go 92.44% <100.00%> (+0.36%) ⬆️
internal/transport/http_util.go 92.06% <100.00%> (ø)
.../resolver/delegatingresolver/delegatingresolver.go 85.50% <90.00%> (+3.28%) ⬆️

... and 98 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 208 to 212
if len(curState.Endpoints) != 0 {
networkType, ok = networktype.Get(curState.Endpoints[0].Addresses[0])
} else {
networkType, ok = networktype.Get(curState.Addresses[0])
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're making an assumption that the network type of the first address of the endpoint is the same the address type of all remaining addresses. This may be true for most real-world usecases, but it isn't a requirement specified in the resolver API. It should not be too difficult to avoid proxying on a per-address basis. There are two places below where we do the following, one for resolver.State.Addresses and once for resolver.State.Endpoints:

proxyattributes.Set(proxyAddr, proxyattributes.Options{
	User:        r.proxyURL.User,
	ConnectAddr: targetAddr.Addr,
})

We can refactor this into a method that takes in the target address and proxy address and returns the combined address.

fn (r *delegatingResolver) combineAddresses(targetAddr, proxyAddr resolver.Address) resolver.Address {
    if  networktype.Get(targetAddr) != "tcp" {
        return targetAddr
    }
    return proxyattributes.Set(proxyAddr, proxyattributes.Options{
	User:        r.proxyURL.User,
	ConnectAddr: targetAddr.Addr,
    })
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment end up adding duplicate addresses in the loop on targetResolverState.Endpoints. We will probably need to invert the loop on r.proxyAddrs and endpt.Addresses and break early:

for _, endpt := range (*r.targetResolverState).Endpoints {
	var addrs []resolver.Address
	for _, targetAddr := range endpt.Addresses {
                 if  networktype.Get(targetAddr) != "tcp" {
                          addrs = append(addrs, targetAddr)
                          continue
                  }
                  for _, proxyAddr := range r.proxyAddrs { 
			addrs = append(addrs, proxyattributes.Set(proxyAddr, proxyattributes.Options{
				User:        r.proxyURL.User,
				ConnectAddr: targetAddr.Addr,
			}))
		}
	}
	endpoints = append(endpoints, resolver.Endpoint{Addresses: addrs})
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, done ! Also added a check in beginning , if there is not address with network type TCP , we do not wait for proxy update and just update the cc state.

@arjan-bal
Copy link
Contributor

Please also fix the format of the release notes.

Comment on lines 208 to 211
if networkType, ok := networktype.Get(addr); !ok || networkType == "tcp" {
isTCP = true
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out here we're assuming !ok to be the same of tcp. However, the old code in the dialer used to parse the address string when !ok to determine the network type, we should probably maintain the same behaviour.

if !ok {
networkType, address = parseDialTarget(address)
}
if networkType == "tcp" && useProxy {
return proxyDial(ctx, address, grpcUA)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 216 to 219
if networkType, ok := networktype.Get(addr); !ok || networkType == "tcp" || isTCP {
isTCP = true
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code will be simpler if we refactor finding a TCP address in a helper function. The helper function can return early instead of keeping track of a isTCP boolean and breaking out of nested loops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Done!

Comment on lines 252 to 257
} else {
addresses = append(addresses, proxyattributes.Set(proxyAddr, proxyattributes.Options{
User: r.proxyURL.User,
ConnectAddr: targetAddr.Addr,
}))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use continue in the if block to avoid indenting the else block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 274 to 281
} else {
for _, proxyAddr := range r.proxyAddrs {
addrs = append(addrs, proxyattributes.Set(proxyAddr, proxyattributes.Options{
User: r.proxyURL.User,
ConnectAddr: targetAddr.Addr,
}))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use continue in the if block to avoid indenting the else block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments.

go func() {
r.childMu.Lock()
defer r.childMu.Unlock()
if _, ok := r.proxyResolver.(nopResolver); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can invert the check to return early, reducing one level of indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Done!

r.childMu.Lock()
defer r.childMu.Unlock()
if _, ok := r.proxyResolver.(nopResolver); ok {
proxyResolver, err := r.proxyURIResolver(resolver.BuildOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of failures, we will still be retrying creation of the proxy resolver every time the target resolver produces an update. Not sure if this is a problem, maybe other reviewers can chime in.

Comment on lines 750 to 753
// ResolveNow of manual proxy resolver will not be called since proxy
// resolver is only built when we get the first update from target resolver
// and so , in the first resolveNow, proxy resolver will be a no-op resolver
// and only target resolver's ResolveNow will be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you please break this long sentence up? There's also an extra space before the ,.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Wait for proxy resolver to be built.
select {
case <-proxyResolverBuilt:
case <-time.After(defaultTestTimeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the existing context here to ensure the test runs for a max of defaultTestTimeout and not defaultTestTimeout per assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// resolver, since we want to avoid proxy for any network type apart from
// tcp.
if diff := cmp.Diff(gotState, wantState); diff != "" {
t.Fatalf("Unexpected state from delegating resolver. Diff (-got +want):\n%v", diff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since diff is a string, we should use %s to format it. This allows linters to catch data type mismatches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// Tests the scenario where a proxy is configured, and the resolver returns
// addresses with varied network type. The test verifies that the delegating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you should be more precise and say "tcp and non-tcp addresses" instead of "varied".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

select {
case <-stateCh:
t.Fatalf("Delegating resolver invoked UpdateState before both the proxy and target resolvers had updated their states.")
case <-time.After(defaultTestShortTimeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, create a single context and use it to wait for the remaining assertions also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultTestShortTimeout is used here because we do not want a state update , I dont think waiting for ctx.Done() will make sense here , and we wait at just one other place for defaultTestTimeout , do let me know if we should do ctx.Done() there?

Comment on lines 252 to 258
// Create a list of combined endpoints by pairing all proxy endpoints with
// every target endpoint. Each time, it constructs a new [resolver.Endpoint]
// using the all addresses from all the proxy endpoint and the target
// addresses from one endpoint. The target address and user information from
// the proxy URL are added as attributes to the proxy address.The resulting
// list of addresses is then grouped into endpoints, covering all
// combinations of proxy and target endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of repetition between this comment and the one at the top of this method. Can you make this one more concise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// triggering a state update if both resolvers are ready. If the ClientConn
// returns a non-nil error, it calls `ResolveNow()` on the proxy resolver. It
// is a StateListener function of wrappingClientConn passed to the target resolver.
// updateTargetResolverState is the StateListener function provided to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this StateListener being mentioned in a couple of places. There is no such method on the resolver.ClientConn interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a method on the wrappingClientConn , used to call different update state functions for proxy and target resolver. Should that not be mentioned in the comments? or is the use not clear ?

@easwars easwars removed their assignment May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

grpc.Dial fails to connect to unix domain socket when https_proxy is set
4 participants