Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve PMTU discovery issues on IPv6 #752

Open
davidelang opened this issue Mar 22, 2024 · 16 comments
Open

resolve PMTU discovery issues on IPv6 #752

davidelang opened this issue Mar 22, 2024 · 16 comments
Labels

Comments

@davidelang
Copy link
Collaborator

davidelang commented Mar 22, 2024

Description

relates to: #554

Google, linkedin posting, and other sites were inacceassable during scale 21x via IPv6

Owen identified this as a PMTU discovery incompatibility between Google and the HE tunnel

Acceptance Criteria

the HE IPv6 tunnel works with PMTU, including Google's implementation, or we find a way to override PMTU or we disable IPv6

@owendelong
Copy link
Collaborator

owendelong commented Mar 25, 2024 via email

@MrHamel
Copy link
Contributor

MrHamel commented Mar 25, 2024 via email

@hriday
Copy link
Contributor

hriday commented Mar 25, 2024 via email

@irabinovitch
Copy link
Contributor

If we continue to prioritize broken IPv6 over usable internet, all we are doing is reinforcing attendee perception that IPv6 isn't ready for prime time and that the first thing one should do when they notice network issues is: disable ipv6.

If we want to drive ipv6 adoption and education through SCALE, we have to make sure ipv6 connectivity actually works and offers an equivalent or better experience to ipv4 only. if we can't then we just need to disable it. id hate to see that be the outcome, but with the current implementation we aren't meeting our attendees, speakers or sponsors needs.

@hriday
Copy link
Contributor

hriday commented Mar 27, 2024 via email

@owendelong
Copy link
Collaborator

This will probably solve it. Performance hit to everyone that can do proper PMTU-D, but hey, by all means, let's cater to Google our corporate overlords above all else:
https://supportportal.juniper.net/s/article/Configuring-TCP-MSS-clamping-on-SRX-devices-to-avoid-unnecessary-fragmentation?language=en_US

@MrHamel
Copy link
Contributor

MrHamel commented Mar 29, 2024 via email

@davidelang
Copy link
Collaborator Author

davidelang commented Mar 29, 2024 via email

@davidelang
Copy link
Collaborator Author

davidelang commented Mar 29, 2024 via email

@owendelong
Copy link
Collaborator

Yes, I would, as a matter of fact, but it turns out that Apple does PMTU-D correctly.

Further, Ryan, if your phone wasn't working on the WiFi at the Hilton, this had NOTHING to do with IPv6 or problems on our network. We don't extend our network to the Hilton and the Hilton has ZERO IPv6 capability. Perhaps your phone just suffers from Android.

Another provider won't help because we still won't be able to get a 1500 octet MTU through you, GRE is GRE and 6in4 is 6in4 and both have a certain amount of overhead that you can't get around. The MTU on the ethernet interface facing the convention center is limited to 1500 octets. They won't do jumbo frames (not like I didn't ask, but the response was something between a blank stare and "what's an MTU" or "what's a frame", or "jumbo what?"). This is not a surprise given the level of training I've observed among their on-site people. They're nice, they try to be helpful, but they really have very minimal training and understanding of networking.

I'm actually less concerned about Android than I am about non-Android users trying to get to Google services from Linux devices, which was the problem we were able to observe and trace in the NOC.

Android would be even harder to troubleshoot since it has a complete lack of troubleshooting tools (e.g. tcpdump or any other libpcap based capture tool) last I heard.

If we want to test it, we'll need to add some equipment behind the tunnel and get a little creative. Doable, but not currently deployed. Right now, the tunnel is just idling on an interface on one of my MX-240s just to keep HE from deleting it. It's not actually moving real traffic or anything and I don't have an easy way to so without adding hardware. I can probably pull a spare SRX I have here into service rather than needing someone to ship our SRX devices. I have the replacement ex4200-48px from Hula already (same day replacement, no questions asked). It's probably a good idea to deploy that and get it tested anyway.

I don't have anything that pretends to be Android, but I can probably through a pi at it and we can at least do some testing with that.
Problem is Pi only fails on Google stuff some times and works mostly. Making it an IPv6-only subnet will probably help make the Pi fail more consistently.

@owendelong
Copy link
Collaborator

On another note, I have good paths into Apple for getting bugs this serious resolved. Google, OTOH, is a black hole of uselessness when it comes to this sort of issue.

@MrHamel
Copy link
Contributor

MrHamel commented Mar 31, 2024 via email

@owendelong
Copy link
Collaborator

owendelong commented Apr 1, 2024 via email

@MrHamel
Copy link
Contributor

MrHamel commented Apr 1, 2024 via email

@owendelong
Copy link
Collaborator

owendelong commented Apr 2, 2024 via email

@owendelong
Copy link
Collaborator

This is basically a duplicate of #554 at this point, so I'm going to close this and focus on that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants