Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[frr/bgpd] unified config missing fpm option breaking VXLAN EVPN #21034

Open
bradh352 opened this issue Dec 5, 2024 · 4 comments · May be fixed by #21053
Open

[frr/bgpd] unified config missing fpm option breaking VXLAN EVPN #21034

bradh352 opened this issue Dec 5, 2024 · 4 comments · May be fixed by #21053
Labels
routing Issue for Routing WG

Comments

@bradh352
Copy link
Contributor

bradh352 commented Dec 5, 2024

Description

During debugging of VXLAN EVPN using unified vs split configuration, the same BGP configuration is made with one exception, the setting of no fpm use-next-hop-groups is forcibly set for all FRR instances in split mode as it calls:

write_default_zebra_config()
{
FILE_NAME=${1}
grep -q '^no fpm use-next-hop-groups' $FILE_NAME || {
echo "no fpm use-next-hop-groups" >> $FILE_NAME
echo "fpm address 127.0.0.1" >> $FILE_NAME
}
}

This setting is needed for VXLAN EVPN VTEPS to come online.

This change was introduced in PR #12852 when switching to the new fpm dataplane plugin.

It appears this PR never updated unified mode's setting in

!
{% block banner %}
! =========== Managed by sonic-cfggen DO NOT edit manually! ====================
! generated by templates/frr.conf.j2 with config DB data
! file: frr.conf
!
{% endblock banner %}
!
{% include "common/daemons.common.conf.j2" %}
!
agentx
!
!Add fpm address for zebra
fpm address 127.0.0.1
{% include "zebra/zebra.interfaces.conf.j2" %}
!
{% if MGMT_VRF_CONFIG %}
{% if MGMT_VRF_CONFIG['vrf_global']['mgmtVrfEnabled'] == 'false' %}
{% include "staticd.db.default_route.conf.j2" %}
{% endif %}
{% else %}
{% include "staticd.db.default_route.conf.j2" %}
{% endif %}
!
{% include "staticd.db.conf.j2" %}
!
{% include "bgpd.conf.db.j2" %}
!
{% include "ospfd.conf.j2" %}
!
{% include "bfdd.conf.j2" %}
!

It appears the zebra config used in separated mode would work as it contains this logic:

{% if ( ('localhost' in DEVICE_METADATA) and ('nexthop_group' in DEVICE_METADATA['localhost']) and
(DEVICE_METADATA['localhost']['nexthop_group'] == 'enabled') ) %}
! enable next hop group support
fpm use-next-hop-groups
{% else %}
! Uses the old known FPM behavior of including next hop information in the route (e.g. RTM_NEWROUTE) messages
no fpm use-next-hop-groups
{% endif %}

I have not validated separated mode, unified is the default and likely what most will use unless they're generating their own configurations.

To verify this is the cause of the issue, you can run

vtysh -c "config" -c "no fpm use-next-hop-groups"

and observe the VTEP magically comes online.

Steps to reproduce the issue:

  1. Configure a minimal VXLAN setup using a unified config with sonic master and [orchagent]: VXLAN: Fix oper_status and tunnel encapsulation TTL sonic-swss#3383 applied to get the vtep state to show up.
  2. view that the status is down in show vxlan remotevtep
  3. run vtysh -c "config" -c "no fpm use-next-hop-groups"
  4. view the status is now up in show vxlan remotevtep

Describe the results you received:

vtep stays down

Describe the results you expected:

vtep should come online

Output of show version:

SONiC Software Version: SONiC.master-broadcom.0-369d470b0
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: 369d470b0
Build date: Wed Dec  4 19:29:25 UTC 2024
Built by: brad@github-runner-ubuntu-2004

Platform: x86_64-dellemc_s5248f_c3538-r0
HwSKU: DellEMC-S5248f-P-25G
ASIC: broadcom
ASIC Count: 1
Serial Number: 638J8K3
Model Number: 04JXCV
Hardware Revision: N/A
Uptime: 00:03:24 up 47 min,  1 user,  load average: 0.58, 0.55, 0.53
Date: Thu 05 Dec 2024 00:03:24

Docker images:
REPOSITORY                    TAG                           IMAGE ID       SIZE
docker-orchagent              latest                        7d8529e558ac   354MB
docker-orchagent              master-broadcom.0-369d470b0   7d8529e558ac   354MB
docker-fpm-frr                latest                        2a5079b6b9a8   375MB
docker-fpm-frr                master-broadcom.0-369d470b0   2a5079b6b9a8   375MB
docker-nat                    latest                        2653522a87ef   344MB
docker-nat                    master-broadcom.0-369d470b0   2653522a87ef   344MB
docker-macsec                 latest                        357e36ee2310   344MB
docker-sflow                  latest                        d1164bf8ba03   342MB
docker-sflow                  master-broadcom.0-369d470b0   d1164bf8ba03   342MB
docker-teamd                  latest                        415984ca7bc7   341MB
docker-teamd                  master-broadcom.0-369d470b0   415984ca7bc7   341MB
docker-syncd-brcm             latest                        3629f9e843af   753MB
docker-syncd-brcm             master-broadcom.0-369d470b0   3629f9e843af   753MB
docker-platform-monitor       latest                        4bca0fae11ac   431MB
docker-platform-monitor       master-broadcom.0-369d470b0   4bca0fae11ac   431MB
docker-dhcp-relay             latest                        969b3600f58b   321MB
docker-snmp                   latest                        d5b355f43560   356MB
docker-snmp                   master-broadcom.0-369d470b0   d5b355f43560   356MB
docker-sonic-mgmt-framework   latest                        aca3d34650a0   399MB
docker-sonic-mgmt-framework   master-broadcom.0-369d470b0   aca3d34650a0   399MB
docker-router-advertiser      latest                        f356005105f4   312MB
docker-router-advertiser      master-broadcom.0-369d470b0   f356005105f4   312MB
docker-mux                    latest                        131c5a2a48cd   363MB
docker-mux                    master-broadcom.0-369d470b0   131c5a2a48cd   363MB
docker-lldp                   latest                        c73bcd484bb1   357MB
docker-lldp                   master-broadcom.0-369d470b0   c73bcd484bb1   357MB
docker-database               latest                        421b12badbea   320MB
docker-database               master-broadcom.0-369d470b0   421b12badbea   320MB
docker-sonic-bmp              latest                        f11c55c250b0   313MB
docker-sonic-bmp              master-broadcom.0-369d470b0   f11c55c250b0   313MB
docker-sonic-gnmi             latest                        0b4adc93a15b   401MB
docker-sonic-gnmi             master-broadcom.0-369d470b0   0b4adc93a15b   401MB
docker-eventd                 latest                        6516930df9c5   312MB
docker-eventd                 master-broadcom.0-369d470b0   6516930df9c5   312MB
docker-gbsyncd-broncos        latest                        aee8c4c5bd59   351MB
docker-gbsyncd-broncos        master-broadcom.0-369d470b0   aee8c4c5bd59   351MB
docker-gbsyncd-credo          latest                        5fe475762db9   324MB
docker-gbsyncd-credo          master-broadcom.0-369d470b0   5fe475762db9   324MB

Output of show techsupport:

Additional information you deem important (e.g. issue happens only occasionally):

@bradh352
Copy link
Contributor Author

bradh352 commented Dec 5, 2024

@stepanblyschak since you wrote this change, can you chime in?

@bradh352
Copy link
Contributor Author

bradh352 commented Dec 5, 2024

example config_db.json that exhibits the issue
config_db.json

@bradh352
Copy link
Contributor Author

bradh352 commented Dec 5, 2024

extracted generated frr.conf showing it does not contain no fpm use-next-hop-groups:
frr.conf

@bradh352 bradh352 changed the title [frr/bgpd] unified config missing fpm option breaking VXLAN [frr/bgpd] unified config missing fpm option breaking VXLAN EVPN Dec 5, 2024
@bradh352 bradh352 linked a pull request Dec 5, 2024 that will close this issue
12 tasks
@tjchadaga tjchadaga added the routing Issue for Routing WG label Dec 18, 2024
@tjchadaga
Copy link
Contributor

Issue to be triaged in routing subgroup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
routing Issue for Routing WG
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants