graph traverse v2 #8876

iamkafai · 2025-05-02T03:39:00Z

Run the new selftests in different compilers.

Implement rekeying of connections with the RxGK security class. This involves regenerating the keys with a different key number as part of the input data after a certain amount of time or a certain amount of bytes encrypted. Rekeying may be triggered by either end. The LSW of the key number is inserted into the security-specific field in the RX header, and we try and expand it to 32-bits to make it last longer. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Herbert Xu <[email protected]> cc: Chuck Lever <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Provide a way for the application (e.g. the afs filesystem) to store private data on the rxrpc_peer structs for later retrieval via the call object. This will allow afs to store a pointer to the afs_server object on the rxrpc_peer struct, thereby obviating the need for afs to keep lookup tables by which it can associate an incoming call with server that transmitted it. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Make the afs_cb_call tracepoint display some security parameters to make debugging easier. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Implement in kafs the hook for adding appdata into a RESPONSE packet generated in response to an RxGK CHALLENGE packet, and include the key for securing the callback channel so that notifications from the fileserver get encrypted. This will be necessary when more complex notifications are used that convey changed data around. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Add more tracing for CHALLENGE and RESPONSE packets. Currently, rxrpc only has client-relevant tracepoints (rx_challenge and tx_response), but add the server-side ones too. Further, record the service ID in the rx_challenge tracepoint as well. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Add RxGK server keys of bytes containing { 0, 1, 2, 3, 4, ... } to the server keyring for the rxperf test server. This allows the rxperf test client to connect to it. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…-kafs' David Howells says: ==================== rxrpc, afs: Add AFS GSSAPI security class to AF_RXRPC and kafs Here's a set of patches to add basic support for the AFS GSSAPI security class to AF_RXRPC and kafs. It provides transport security for keys that match the security index 6 (YFS) for connections to the AFS fileserver and VL server. Note that security index 4 (OpenAFS) can also be supported using this, but it needs more work as it's slightly different. The patches also provide the ability to secure the callback channel - connections from the fileserver back to the client that are used to pass file change notifications, amongst other things. When challenged by the fileserver, kafs will generate a token specific to that server and include it in the RESPONSE packet as the appdata. The server then extracts this and uses it to send callback RPC calls back to the client. It can also be used to provide transport security on the callback channel, but a further set of patches is required to provide the token and key to set that up when the client responds to the fileserver's challenge. This makes use of the previously added crypto-krb5 library that is now upstream (last commit fc0cf10). This series of patches consist of the following parts: (0) Update kdoc comments to remove some kdoc builder warnings. (1) Push reponding to CHALLENGE packets over to recvmsg() or the kernel equivalent so that the application layer can include user-defined information in the RESPONSE packet. In a follow-up patch set, this will allow the callback channel to be secured by the AFS filesystem. (2) Add the AF_RXRPC RxGK security class that uses a key obtained from the AFS GSS security service to do Kerberos 5-based encryption instead of pcbc(fcrypt) and pcbc(des). (3) Add support for callback channel encryption in kafs. (4) Provide the test rxperf server module with some fixed krb5 keys. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

ethqos->serdes_speed represents the current speed the serdes was configured for, which should be the same as ethqos->speed. Since we wish to remove ethqos->speed to simplify the code, switch to using the serdes_speed instead. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Rather than ethqos_fix_mac_speed() storing the speed in struct qcom_ethqos and then functions that are only called from here reading that speed, pass the speed to the called functions instead. This removes all readers of this struct member, which then allows the removal of the two places that set its value and the struct member. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Phylink will already limit the MAC speed according to the interface, so if 2500BASE-X is selected, the maximum speed will be 2.5G. It is, therefore, not necessary to set a speed limit. Remove setting plat_dat->max_speed from this glue driver. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

qcom-ethqos doesn't need to implement the speed_mode_2500() method as it is only setting priv->plat->phy_interface to 2500BASE-X, which is already a pre-condition for assigning speed_mode_2500 in qcom_ethqos_probe(). So, qcom_ethqos_speed_mode_2500() has no effect. Remove it. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Russell King says: ==================== net: stmmac: qcom-ethqos: simplifications Remove unnecessary code from the qcom-ethqos glue driver. Start by consistently using -> serdes_speed to set the speed of the serdes PHY rather than sometimes using ->serdes_speed and sometimes using ->speed. This then allows the removal of ->speed in the second patch. There is no need to set the maximum speed just because we're using 2500BASE-X - phylink already knows that 2500BASE-X can't support faster speeds. This then makes qcom_ethqos_speed_mode_2500() redundant as it's setting the interface mode to the value that was determined in the switch statement that already determined that the interface mode had this value. Not tested on hardware. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Add a driver for the MDIO controller on the RTL9300 family of Ethernet switches with integrated SoC. There are 4 physical SMI interfaces on the RTL9300 however access is done using the switch ports. The driver takes the MDIO bus hierarchy from the DTS and uses this to configure the switch ports so they are associated with the correct PHY. This mapping is also used when dealing with software requests from phylib. Signed-off-by: Chris Packham <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This patch adds lock protection for the hardware statistics for fbnic. The hardware statistics access via ndo_get_stats64 is not protected by the rtnl_lock(). Since these stats can be accessed from different places in the code such as service task, ethtool, Q-API, and net_device_ops, a lock-less approach can lead to races. Note that this patch is not a fix rather, just a prep for the subsequent changes in this series. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Mohsin Bashir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This patch provides support for hardware queue stats and covers packet errors for RX-DMA engine, RCQ drops and BDQ drops. The packet errors are also aggregated with the `rx_errors` stats in the `rtnl_link_stats` as well as with the `hw_drops` in the queue API. The RCQ and BDQ drops are aggregated with `rx_over_errors` in the `rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API. ethtool -S eth0 | grep -E 'rde' rde_0_pkt_err: 0 rde_0_pkt_cq_drop: 0 rde_0_pkt_bdq_drop: 0 --- --- rde_127_pkt_err: 0 rde_127_pkt_cq_drop: 0 rde_127_pkt_bdq_drop: 0 Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Mohsin Bashir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This patch provides coverage to the RXB (RX Buffer) stats. RXB stats are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue stats. The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the input/output counters whereas, the RXB fifo stats are indexed from 0-7. The RXB also supports pause frame stats counters which we are leaving for a later patch. ethtool -S eth0 | grep rxb rxb_integrity_err0: 0 rxb_mac_err0: 0 rxb_parser_err0: 0 rxb_frm_err0: 0 rxb_drbo0_frames: 1433543 rxb_drbo0_bytes: 775949081 --- --- rxb_intf3_frames: 1195711 rxb_intf3_bytes: 739650210 rxb_pbuf3_frames: 1195711 rxb_pbuf3_bytes: 765948092 Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Mohsin Bashir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This patch add coverage for TMI stats including PTP stats and drop stats. PTP stats include illegal requests, bad timestamp and good timestamps. The bad timestamp and illegal request counters are reported under as `error` via `ethtool -T` Both these counters are individually being reported via `ethtool -S` The good timestamp stats are being reported as `pkts` via `ethtool -T` ethtool -S eth0 | grep "ptp" ptp_illegal_req: 0 ptp_good_ts: 0 ptp_bad_ts: 0 Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Mohsin Bashir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are tracking packets and control message drops because of credit exhaustion on the TX interface. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Mohsin Bashir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Mohsin Bashir says: ==================== eth: fbnic: extend hardware stats coverage This patch series extends the coverage for hardware stats reported via `ethtool -S`, queue API, and rtnl link stats. The patchset is organized as follow: - The first patch adds locking support to protect hardware stats. - The second patch provides coverage to the hardware queue stats. - The third patch covers the RX buffer related stats. - The fourth patch covers the TMI (TX MAC Interface) stats. - The last patch cover the TTI (TX TEI Interface) stats. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

In preparation for migration to use of standard MIB API, generalize the read port stats logic to a dedicated function. This will permit to manually provide the offset and size of the MIB counter to directly access specific counter. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Drop custom handling of packet size and RX error MIB counter and handle them in the standard .get_rmon_stats API The MIB entry are dropped from the custom MIB table and converted to a define providing only the MIB offset. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Drop custom handling of TX/RX pause frame MIB counter and handle them in the standard .get_eth_ctrl_stats API The MIB entry are dropped from the custom MIB table and converted to a define providing only the MIB offset. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

… API Drop custom handling of TX/RX packet stats and error MIB counter and handle them in the standard .get_eth_mac_stats API The MIB entry are dropped from the custom MIB table and converted to a define providing only the MIB offset. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

For consistency with the other MIB counter, move also the remaining MIB counter to define and update the custom MIB table. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

It was reported that the internally calculated counter might differ from the real one from the Switch MIB. This can happen if the switch directly forward packets between the ports or offload small packets like ARP request. In such case, the kernel counter will desync compared to the real one transmitted and received by the Switch. To correctly provide the real info to the kernel, implement .get_stats64 that will directly read the current MIB counter from the switch register. Signed-off-by: Christian Marangi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Christian Marangi says: ==================== net: dsa: mt7530: modernize MIB handling + fix This small series modernize MIB handling for MT7530 and also implement .get_stats64. It was reported that kernel and Switch MIB desync in scenario where a packet is forwarded from a port to another. In such case, the forwarding is offloaded and the kernel is not aware of the transmitted packet. To handle this, read the counter directly from Switch registers. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This patch suggests the replacement of strncpy with strscpy as per Documentation/process/deprecated. The strncpy() fails to guarantee NULL termination, The function adds zero pads which isn't really convenient for short strings as it may cause performance issues. strscpy() is a preferred replacement because it overcomes the limitations of strncpy mentioned above. Compile Tested Signed-off-by: Kevin Paul Reddy Janagari <[email protected]> Reviewed-by: Tung Nguyen <[email protected]> Tested-by: Tung Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Correct Get Controller Packet Statistics (GCPS) 64-bit wide member variables, as per DSP0222 v1.0.0 and forward specs. The Driver currently collects these stats, but they are yet to be exposed to the user. Therefore, no user impact. Statistics fixes: Total Bytes Received (byte range 28..35) Total Bytes Transmitted (byte range 36..43) Total Unicast Packets Received (byte range 44..51) Total Multicast Packets Received (byte range 52..59) Total Broadcast Packets Received (byte range 60..67) Total Unicast Packets Transmitted (byte range 68..75) Total Multicast Packets Transmitted (byte range 76..83) Total Broadcast Packets Transmitted (byte range 84..91) Valid Bytes Received (byte range 204..11) Signed-off-by: Hari Kalavakunta <[email protected]> Reviewed-by: Paul Fertser <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

Prevent from proceeding if there's nothing to print. Suggested-by: Przemek Kitszel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Tested-by: Bharath R <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>

Wrap use of netdev_priv() in order to change the allocator of the device private structure from alloc_etherdev_mq() to the devlink in next commit. All but one netdev_priv() calls in the whole driver are replaced, the remaining one is called on MACVLAN (so not ixgbe) device. Signed-off-by: Przemek Kitszel <[email protected]> Tested-by: Bharath R <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>

Don't populate the read-only array offsets on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Tested-by: Wolfram Sang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Convert visconti to use the set_clk_tx_rate() method. By doing so, the GMAC control register will already have been updated (unlike with the fix_mac_speed() method) so this code can be removed while porting to the set_clk_tx_rate() method. There is also no need for the spinlock, and has never been - neither fix_mac_speed() nor set_clk_tx_rate() can be called by more than one thread at a time, so the lock does nothing useful. Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Acked-by: Nobuhiro Iwamatsu <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

ops_undo_list() first iterates over ops_list for ->pre_exit(). Let's check if any of the ops has ->exit_rtnl() there and drop the hold_rtnl argument. Note that nexthop uses ->exit_rtnl() and is built-in, so hold_rtnl is always true for setup_net() and cleanup_net() for now. Suggested-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

pfcp_net_exit() holds RTNL and cleans up all devices in the netns and other devices tied to sockets in the netns. We can use ->exit_rtnl() to save RTNL dance for all dying netns. Note that we delegate the for_each_netdev() part to default_device_exit_batch() to avoid a list corruption splat like the one reported in commit 4ccacf8 ("gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().") Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

ppp_exit_net() unregisters devices related to the netns under RTNL and destroys lists and IDR. Let's use ->exit_rtnl() for the device unregistration part to save RTNL dances for each netns. Note that we delegate the for_each_netdev_safe() part to default_device_exit_batch() and replace unregister_netdevice_queue() with ppp_nl_dellink() to align with bond, geneve, gtp, and pfcp. Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Kuniyuki Iwashima says: ==================== net: Followup series for ->exit_rtnl(). Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub. Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp. v1: https://lore.kernel.org/[email protected] ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Tunnel types VXLAN/VXLAN_GPE/GENEVE are supported for txgbe devices. The hardware supports to set only one port for each tunnel type. Signed-off-by: Jiawen Wu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Implement ndo_features_check to restrict Tx checksum offload flags, since there are some inner layer length and protocols unsupported. Signed-off-by: Jiawen Wu <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Jiawen Wu says: ==================== Implement udp tunnel port for txgbe v3: https://lore.kernel.org/[email protected] v2: https://lore.kernel.org/[email protected] v1: https://lore.kernel.org/[email protected] ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Commit 03df156 ("xdp: double protect netdev->xdp_flags with netdev->lock") introduces the netdev lock to xdp_set_features_flag(). The change includes a _locked version of the method, as it is possible for a driver to have already acquired the netdev lock before calling this helper. However, the same applies to xdp_features_(set|clear)_redirect_flags(), which ends up calling the unlocked version of xdp_set_features_flags() leading to deadlocks in GVE, which grabs the netdev lock as part of its suspend, reset, and shutdown processes: [ 833.265543] WARNING: possible recursive locking detected [ 833.270949] 6.15.0-rc1 kernel-patches#6 Tainted: G E [ 833.276271] -------------------------------------------- [ 833.281681] systemd-shutdow/1 is trying to acquire lock: [ 833.287090] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: xdp_set_features_flag+0x29/0x90 [ 833.295470] [ 833.295470] but task is already holding lock: [ 833.301400] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] [ 833.309508] [ 833.309508] other info that might help us debug this: [ 833.316130] Possible unsafe locking scenario: [ 833.316130] [ 833.322142] CPU0 [ 833.324681] ---- [ 833.327220] lock(&dev->lock); [ 833.330455] lock(&dev->lock); [ 833.333689] [ 833.333689] *** DEADLOCK *** [ 833.333689] [ 833.339701] May be due to missing lock nesting notation [ 833.339701] [ 833.346582] 5 locks held by systemd-shutdow/1: [ 833.351205] #0: ffffffffa9c89130 (system_transition_mutex){+.+.}-{4:4}, at: __se_sys_reboot+0xe6/0x210 [ 833.360695] kernel-patches#1: ffff93b399e5c1b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xb4/0x1f0 [ 833.369144] kernel-patches#2: ffff949d19a471b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xc2/0x1f0 [ 833.377603] kernel-patches#3: ffffffffa9eca050 (rtnl_mutex){+.+.}-{4:4}, at: gve_shutdown+0x33/0x90 [gve] [ 833.386138] kernel-patches#4: ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] Introduce xdp_features_(set|clear)_redirect_target_locked() versions which assume that the netdev lock has already been acquired before setting the XDP feature flag and update GVE to use the locked version. Fixes: 03df156 ("xdp: double protect netdev->xdp_flags with netdev->lock") Tested-by: Mina Almasry <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Joshua Washington <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…functions When a bridge port STP state is changed from BLOCKING/DISABLED to FORWARDING, the port's igmp query timer will NOT re-arm itself if the bridge has been configured as per-VLAN multicast snooping. Solve this by choosing the correct multicast context(s) to enable/disable port multicast based on whether per-VLAN multicast snooping is enabled or not, i.e. using per-{port, VLAN} context in case of per-VLAN multicast snooping by re-implementing br_multicast_enable_port() and br_multicast_disable_port() functions. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

When the vlan STP state is changed, which could be manipulated by "bridge vlan" commands, similar to port STP state, this also impacts multicast behaviors such as igmp query. In the scenario of per-VLAN snooping, there's a need to update the corresponding multicast context to re-arm the port query timer when vlan state becomes "forwarding" etc. Update br_vlan_set_state() function to enable vlan multicast context in such scenario. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

…e changes Change ALL_TESTS definition to "test-per-line". Add the test case of per vlan snooping with port stp state change to forwarding and also vlan equivalent case in both bridge_igmp.sh and bridge_mld.sh. Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Yong Wang says: ==================== bridge: multicast: per vlan query improvement when port or vlan state changes The current implementation of br_multicast_enable_port() only operates on port's multicast context, which doesn't take into account in case of vlan snooping, one downside is the port's igmp query timer will NOT resume when port state gets changed from BR_STATE_BLOCKING to BR_STATE_FORWARDING etc. Such code flow will briefly look like: 1.vlan snooping --> br_multicast_port_query_expired with per vlan port_mcast_ctx --> port in BR_STATE_BLOCKING state --> then one-shot timer discontinued The port state could be changed by STP daemon or kernel STP, taking mstpd as example: 2.mstpd --> netlink_sendmsg --> br_setlink --> br_set_port_state with non blocking states, i.e. BR_STATE_LEARNING or BR_STATE_FORWARDING --> br_port_state_selection --> br_multicast_enable_port --> enable multicast with port's multicast_ctx Here for per vlan snooping, the vlan context of the port should be used instead of port's multicast_ctx. The first patch corrects such behavior. Similarly, vlan state change also impacts multicast behavior, the 2nd patch adds function to update the corresponding multicast context when vlan state changes. The 3rd patch adds the selftests to confirm that IGMP/MLD query does happen when the STP state becomes forwarding. ==================== Signed-off-by: David S. Miller <[email protected]>

The CONFIG_DEFAULT_HUNG_TASK_TIMEOUT setting is only available when the hung task detection is enabled, otherwise the code now produces a build failure: drivers/net/ethernet/broadcom/bnxt/bnxt.c:10188:21: error: use of undeclared identifier 'CONFIG_DEFAULT_HUNG_TASK_TIMEOUT' 10188 | max_tmo_secs > CONFIG_DEFAULT_HUNG_TASK_TIMEOUT) { Enclose this warning logic in an #ifdef to ensure this builds. Fixes: 0fcad44 ("bnxt_en: Change FW message timeout warning") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

If the CONFIG_NET_SCH_BPF configuration is not enabled, the BPF test compilation will report the following error: In file included from progs/bpf_qdisc_fq.c:39: progs/bpf_qdisc_common.h:17:51: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 17 | void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 309 | struct bpf_sk_buff_ptr *to_free) | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] progs/bpf_qdisc_fq.c:308:5: error: conflicting types for '____bpf_fq_enqueue' Fixes: 11c7016 ("selftests/bpf: Add a basic fifo qdisc test") Signed-off-by: Feng Yang <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://patch.msgid.link/[email protected]

Use bpf_try_module_get()/bpf_module_put() instead of try_module_get()/ module_put() when handling default qdisc since users can assign a bpf qdisc to it. To trigger the bug: $ bpftool struct_ops register bpf_qdisc_fq.bpf.o /sys/fs/bpf $ echo bpf_fq > /proc/sys/net/core/default_qdisc Fixes: c824034 ("bpf: net_sched: Support implementation of Qdisc_ops in bpf") Signed-off-by: Amery Hung <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://patch.msgid.link/[email protected]

In a later patch, two new kfuncs will take the bpf_rb_node pointer arg. struct bpf_rb_node *bpf_rbtree_left(struct bpf_rb_root *root, struct bpf_rb_node *node); struct bpf_rb_node *bpf_rbtree_right(struct bpf_rb_root *root, struct bpf_rb_node *node); In the check_kfunc_call, there is a "case KF_ARG_PTR_TO_RB_NODE" to check if the reg->type should be an allocated pointer or should be a non_owning_ref. The later patch will need to ensure that the bpf_rb_node pointer passing to the new bpf_rbtree_{left,right} must be a non_owning_ref. This should be the same requirement as the existing bpf_rbtree_remove. This patch swaps the current "if else" statement. Instead of checking the bpf_rbtree_remove, it checks the bpf_rbtree_add. Then the new bpf_rbtree_{left,right} will fall into the "else" case to make the later patch simpler. bpf_rbtree_add should be the only one that needs an allocated pointer. This should be a no-op change considering there are only two kfunc(s) taking bpf_rb_node pointer arg, rbtree_add and rbtree_remove. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]>

…_node pointer The current rbtree kfunc, bpf_rbtree_{first, remove}, returns the bpf_rb_node pointer. The check_kfunc_call currently checks the kfunc btf_id instead of its return pointer type to decide if it needs to do mark_reg_graph_node(reg0) and ref_set_non_owning(reg0). The later patch will add bpf_rbtree_{root,left,right} that will also return a bpf_rb_node pointer. Instead of adding more kfunc btf_id checks to the "if" case, this patch changes the test to check the kfunc's return type. is_rbtree_node_type() function is added to test if a pointer type is a bpf_rb_node. The callers have already skipped the modifiers of the pointer type. A note on the ref_set_non_owning(), although bpf_rbtree_remove() also returns a bpf_rb_node pointer, the bpf_rbtree_remove() has the KF_ACQUIRE flag. Thus, its reg0 will not become non-owning. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]>

In the kernel fq qdisc implementation, it requires to traverse a rbtree stored with the networking "flows". In the later bpf selftests prog, the much simplified logic that uses the bpf_rbtree_{root,left,right} to traverse the tree is like: struct fq_flow { struct bpf_rb_node fq_node; struct bpf_rb_node rate_node; struct bpf_refcount refcount; unsigned long sk_long; }; struct fq_flow_root { struct bpf_spin_lock lock; struct bpf_rb_root root __contains(fq_flow, fq_node); }; struct fq_flow *fq_classify(...) { struct bpf_rb_node *tofree[FQ_GC_MAX]; struct fq_flow_root *root; struct fq_flow *gc_f, *f; struct bpf_rb_node *p; int i, fcnt = 0; /* ... */ f = NULL; bpf_spin_lock(&root->lock); p = bpf_rbtree_root(&root->root); while (can_loop) { if (!p) break; gc_f = bpf_rb_entry(p, struct fq_flow, fq_node); if (gc_f->sk_long == sk_long) { f = bpf_refcount_acquire(gc_f); break; } /* To be removed from the rbtree */ if (fcnt < FQ_GC_MAX && fq_gc_candidate(gc_f, jiffies_now)) tofree[fcnt++] = p; if (gc_f->sk_long > sk_long) p = bpf_rbtree_left(&root->root, p); else p = bpf_rbtree_right(&root->root, p); } /* remove from the rbtree */ for (i = 0; i < fcnt; i++) { p = tofree[i]; tofree[i] = bpf_rbtree_remove(&root->root, p); } bpf_spin_unlock(&root->lock); /* bpf_obj_drop the fq_flow(s) that have just been removed * from the rbtree. */ for (i = 0; i < fcnt; i++) { p = tofree[i]; if (p) { gc_f = bpf_rb_entry(p, struct fq_flow, fq_node); bpf_obj_drop(gc_f); } } return f; } The above simplified code needs to traverse the rbtree for two purposes, 1) find the flow with the desired sk_long value 2) while searching for the sk_long, collect flows that are the fq_gc_candidate. They will be removed from the rbtree. This patch adds the bpf_rbtree_{root,left,right} kfunc to enable the rbtree traversal. The returned bpf_rb_node pointer will be a non-owning reference which is the same as the returned pointer of the exisiting bpf_rbtree_first kfunc. To avoid bisect failure, Some of the failure messages in the rbtree_fail test are also adjusted together in this patch. The message is now "bpf_rbtree_remove can only take non-owning bpf_rb_node pointer". Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> selftests/bpf: Adjust failure message in the rbtree_fail test Some of the failure messages in the rbtree_fail test. The message is now "bpf_rbtree_remove can only take non-owning bpf_rb_node pointer". Signed-off-by: Martin KaFai Lau <[email protected]>

The bpf_rbtree_{remove,left,right} requires the root's lock to be held. They also check the node_internal->owner is still owned by that root before proceeding, so it is safe to allow refcounted bpf_rb_node pointer to be used in these kfuncs. In the later selftest, a networking flow (allocated by bpf_obj_new) can be added to two different rbtrees. There are cases that the flow is searched from one rbtree, held the refcount of the flow, and then removed from the another rbtree: struct fq_flow { struct bpf_rb_node fq_node; struct bpf_rb_node rate_node; struct bpf_refcount refcount; unsigned long sk_long; }; int bpf_fq_enqueue(...) { /* ... */ bpf_spin_lock(&root->lock); while (can_loop) { /* ... */ if (!p) break; gc_f = bpf_rb_entry(p, struct fq_flow, fq_node); if (gc_f->sk_long == sk_long) { f = bpf_refcount_acquire(gc_f); break; } /* ... */ } bpf_spin_unlock(&root->lock); if (f) { bpf_spin_lock(&q->lock); bpf_rbtree_remove(&q->delayed, &f->rate_node); bpf_spin_unlock(&q->lock); } } bpf_rbtree_{left,right} do not need this change but are relaxed together with bpf_rbtree_remove instead of adding extra verifier logic to exclude these kfuncs. To avoid bi-sect failure, this patch also changes the selftests together: First change, it does not expect a verifier's error now. Second change, the test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL. The test uses __retval(0) to ensure this NULL return value. Some of the "only take non-owning..." failure messages are changed also. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]>

This patch has a much simplified rbtree usage from the kernel sch_fq qdisc. It has a "struct node_data" which can be added to two different rbtrees which are ordered by different keys. The test first populates both rbtrees. Then search for a lookup_key from the "groot0" rbtree. Once the lookup_key is found, that node refcount is taken. The node is then removed from another "groot1" rbtree. While searching the lookup_key, the test will also try to remove all rbnodes in the path leading to the lookup_key. The test_{root,left,right}* tests ensure that the return value of the bpf_rbtree functions is a non_own_ref node pointer. This is done by forcing an verifier error by calling a helper bpf_jiffies64() while holding the spinlock. The tests then check for the verifier message "call bpf_rbtree...R0=rcu_ptr_or_null_node..." The other test_{root,left,right}* tests ensure that they must be called with spinlock held. Suggested-by: Kumar Kartikeya Dwivedi <[email protected]> # Check non_own_ref marking Signed-off-by: Martin KaFai Lau <[email protected]>

…_node pointer The next patch will add bpf_list_{front,back} kfuncs to peek the head and tail of a list. Both of them will return a 'struct bpf_list_node *'. Follow the earlier change for rbtree, this patch checks the return btf type is a 'struct bpf_list_node' pointer instead of checking each kfuncs individually to decide if mark_reg_graph_node should be called. This will make the bpf_list_{front,back} kfunc addition easier in the later patch. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]>

In the kernel fq qdisc implementation, it only needs to look at the fields of the first node in a list but does not always need to remove it from the list. It is more convenient to have a peek kfunc for the list. It works similar to the bpf_rbtree_first(). This patch adds bpf_list_{front,back} kfunc. The verifier is changed such that the kfunc returning "struct bpf_list_node *" will be marked as non-owning. The exception is the KF_ACQUIRE kfunc. The net effect is only the new bpf_list_{front,back} kfuncs will have its return pointer marked as non-owning. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]>

This patch adds the "list_peek" test to use the new bpf_list_{front,back} kfunc. The test_{front,back}* tests ensure that the return value is a non_own_ref node pointer and requires the spinlock to be held. Suggested-by: Kumar Kartikeya Dwivedi <[email protected]> # check non_own_ref marking Signed-off-by: Martin KaFai Lau <[email protected]>

dhowells and others added 30 commits April 14, 2025 17:36

ColinIanKing and others added 26 commits April 22, 2025 19:05

adding ci files

f69da1d

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 4 times, most recently from 786af9a to 309dda5 Compare May 5, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph traverse v2 #8876

graph traverse v2 #8876

iamkafai commented May 2, 2025

graph traverse v2 #8876

Are you sure you want to change the base?

graph traverse v2 #8876

Conversation

iamkafai commented May 2, 2025