-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added required call to allocate VIPs when endpoints are restored #2468
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2468 +/- ##
==========================================
- Coverage 62.07% 61.72% -0.35%
==========================================
Files 49 128 +79
Lines 6842 21120 +14278
==========================================
+ Hits 4247 13037 +8790
- Misses 2162 6677 +4515
- Partials 433 1406 +973 |
One thing I wanted to mention, which came up on moby/moby#35675, is that our situation includes a current service (A) that overlaps with a deleted service (K). (In particular, We're not sure if the overlap happened before or after K was deleted (I can try to find out, although I'm not sure where to look), but I wanted to mention it just in case it's relevant to this PR. We definitely have restarted Docker and/or rebooted nodes at some point in here. Thanks for tracking this down! |
Thanks for the additional information @jbscare - appreciated 👍 |
Thanks @jbscare. I can't immediately see a sequence of operations that would land up in the state you see, but still I'm tempted to say that the cause of what you see is this issue. The reason is that when the module that keeps track of IP addrs in use and the actual addresses used go out of sync, it doesn't cause an immediate fatal error. It just limps along and can get itself into very unusual states including the one you see. Even when it's in a bad way, some of the mesh networking keeps it working so the actual failure is discovered quite late (unless of course you do a sensitive test like yours where you track connections to individual containers). If it's possible to do the following test it would help a lot. It's a bit crude but should give a strong indication of whether this is the cause of the problem:
These steps were just to verify that you do see overlaps. Now for a crude way to temporarily avoid overlaps - if this works then you are very likely hitting this issue:
Thanks again! |
This is definitely in the wrong place, because we shouldn't be allocating in IsServiceAllocated; we should just be returning God, this part of the code is such a mess. It desperately needs to be refactored. |
Ok, so taking a deeper dig... This doesn't LGTM, because we shouldn't be allocating vips in |
I agree with @dperny that the function IsServiceAllocated() doesn't seem like the right place to be doing this. From the sound of it, it seems like IsServiceAllocated() should be a 'const' function with no side effects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@balrajsingh can you also please see if a unit-test can be added for the fix ?
Fair enough, it should then be done where it makes most sense per the overall design. |
Please sign your commits following these rules: $ git clone -b "libnetwork_1790" [email protected]:balrajsingh/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354609176
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f Amending updates the existing PR. You DO NOT need to open a new one. |
@@ -404,6 +404,9 @@ func (na *cnmNetworkAllocator) IsServiceAllocated(s *api.Service, flags ...func( | |||
vipLoop: | |||
for _, vip := range s.Endpoint.VirtualIPs { | |||
if na.IsVIPOnIngressNetwork(vip) && networkallocator.IsIngressNetworkNeeded(s) { | |||
if _, ok := na.services[s.ID]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the comment here explaining the scenario which is , If the ingress network is required and the allocation is not done for the ingress network.
05a57a9
to
cf08f3e
Compare
Thanks @abhi and thanks also for suggesting this fix. |
manager/allocator/allocator_test.go
Outdated
|
||
VirtualIPs: []*api.Endpoint_VirtualIP{ | ||
{ | ||
NetworkID: "ingress-nw-id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ingress subnet is /16. This should be fine for unit test.
manager/allocator/allocator_test.go
Outdated
s := store.NewMemoryStore(nil) | ||
assert.NotNil(t, s) | ||
defer s.Close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra empty line
manager/allocator/allocator_test.go
Outdated
PublishedPort: uint32(8001 + i), | ||
}, | ||
}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
manager/allocator/allocator_test.go
Outdated
PublishedPort: uint32(8001 + i), | ||
}, | ||
}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
manager/allocator/allocator_test.go
Outdated
DesiredState: api.TaskStateRunning, | ||
} | ||
assert.NoError(t, store.CreateTask(tx, tsk)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
manager/allocator/allocator_test.go
Outdated
panic("missing task networks") | ||
} | ||
if len(task.Networks[0].Addresses) == 0 { | ||
panic("missing task network address") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why panic we should use assert like its used in all other tests.
manager/allocator/allocator_test.go
Outdated
|
||
assignedIP := task.Networks[0].Addresses[0] | ||
if assignedIPs[assignedIP] { | ||
t.Fatalf("task %s assigned duplicate IP %s", task.ID, assignedIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. use assert
manager/allocator/allocator_test.go
Outdated
t.Fatalf("task %s assigned duplicate IP %s", task.ID, assignedIP) | ||
} | ||
assignedIPs[assignedIP] = true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
manager/allocator/allocator_test.go
Outdated
assignedIPs[assignedIP] = true | ||
|
||
if assignedVIPs[assignedIP] { | ||
t.Fatalf("a service and task %s have the same IP %s", task.ID, assignedIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert
manager/allocator/allocator_test.go
Outdated
if assignedVIPs[assignedIP] { | ||
t.Fatalf("a service and task %s have the same IP %s", task.ID, assignedIP) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments
manager/allocator/allocator_test.go
Outdated
} | ||
assignedIP := task.Networks[0].Addresses[0] | ||
if assignedIPs[assignedIP] { | ||
t.Fatalf("task %s assigned duplicate IP %s", task.ID, assignedIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this t.Fatalf is right every where. We need to return the assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On leader change or leader reboot the restore logic in the allocator was allocating overlapping IP address for VIPs and Endpoints in the ingress network. The fix added as part of this commit ensures that during restore we allocate the existing VIP and endpoint. Signed-off-by: Balraj Singh <[email protected]>
a6b7a7b
to
e5515c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@anshulpundir @dperny, @balrajsingh has found the root cause. This issue happens only when service is attached to ingress network (handled a little different) The fix is ready for a final review. PTAL |
} | ||
|
||
assignedVIPs := make(map[string]bool) | ||
assignedIPs := make(map[string]bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could have used 1 map, anyway there must be no overlap between VIP and task IPs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after fixing the one style nit.
|
||
assignedVIPs := make(map[string]bool) | ||
assignedIPs := make(map[string]bool) | ||
hasNoIPOverlapServices := func(fakeT assert.TestingT, service *api.Service) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strictly speaking, you should either use fakeT
instead of t
in your calls to assert
functions, or you should change the function signature to replace this variable with an underscore _
. The former is the probably correct option. Or, you can just change the variable name of fakeT
to t
and shadow the variable t
instead of closing over it.
Signed-off-by: Balraj Singh <[email protected]>
e5515c3
to
2397ddf
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yeah LGTM. |
strictly speaking, abhi and flavio aren't maintainers, but they know this code probably better than most of the maintainers, so i'm merging off their LGTMs. |
Tracking down libnetwork/1790#issuecomment-308222053. The report is that if on a single node several services are started, and if this node is then rebooted, all the services appear to come back but some of them are no longer reachable. On probing, the cause turned out to be an invalid assignment of IP addresses to services when they were restored. Specifically, the same IP address was assigned to one service's VIP and also a different service's endpoint. The result was that packets got delivered to the wrong container and caused symptoms like services or ports unreachable. This is very likely to also be the cause of moby/#35675 and other duplicate-IP or overlapping IP issues. The reason for this problem seems to be that the code path followed when services are restored, at no point contacts or informs IPAM about the IP addresses used as the restored service's VIP. So IPAM thinks that those IP addresses are still available and hands them out to endpoints and new services, causing the observed chaos. To work out the right fix, I compared the code path when a service is created from the CLI to the code path when a service is restored on reboot. To me this fix is the bit that should have always been on the restore path but was omitted. With this fix IPAM gets correctly informed and it's state is consistent with what I see on the network. I have tested this fix on a single node running several services and when there multiple nodes with multiple managers running many services (specifically 2 nodes and 2 managers). In both cases, without the fix a reboot would cause IP address overlaps on the ingress network. With the fix there are no overlaps. While the fix seems to work, I'm not sure if it is at exactly the right point in this function, or indeed if it is the right or complete fix. Please take a look and let me know. Signed-off-by: abhi <[email protected]>
[Backport 17.06.5] #2468 Added required call to allocate VIPs when endpoints are restored
Tracking down libnetwork/1790#issuecomment-308222053. The report is that if on a single node several services are started, and if this node is then rebooted, all the services appear to come back but some of them are no longer reachable.
On probing, the cause turned out to be an invalid assignment of IP addresses to services when they were restored. Specifically, the same IP address was assigned to one service's VIP and also a different service's endpoint. The result was that packets got delivered to the wrong container and caused symptoms like services or ports unreachable.
This is very likely to also be the cause of moby/#35675 and other duplicate-IP or overlapping IP issues.
The reason for this problem seems to be that the code path followed when services are restored, at no point contacts or informs IPAM about the IP addresses used as the restored service's VIP. So IPAM thinks that those IP addresses are still available and hands them out to endpoints and new services, causing the observed chaos.
To work out the right fix, I compared the code path when a service is created from the CLI to the code path when a service is restored on reboot. To me this fix is the bit that should have always been on the restore path but was omitted. With this fix IPAM gets correctly informed and it's state is consistent with what I see on the network.
I have tested this fix on a single node running several services and when there multiple nodes with multiple managers running many services (specifically 2 nodes and 2 managers). In both cases, without the fix a reboot would cause IP address overlaps on the ingress network. With the fix there are no overlaps.
While the fix seems to work, I'm not sure if it is at exactly the right point in this function, or indeed if it is the right or complete fix. Please take a look and let me know.
@dperny @fcrisciani
Signed-off-by: Balraj Singh [email protected]