KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803

brandboat · 2025-05-24T16:20:28Z

This patch resolves the following issues in MetadataCache#toCluster:

Avoids duplicate Node entries when a broker has multiple endpoints.
Fixes a bug where fenced brokers result in NPE.
Ensures missing topic IDs are properly populated in cluster metadata.
Delete unused test code snippet.

m1a2st

Thanks @brandboat for this patch, left a question.

m1a2st · 2025-05-25T13:25:08Z

metadata/src/main/java/org/apache/kafka/metadata/MetadataCache.java

+        Map<Integer, Node> nodesById = image.cluster().brokers().values().stream()
+            .collect(Collectors.toMap(BrokerRegistration::id, broker -> broker.nodes().get(0)));


Why shouldn’t we filter out the fenced broker here?

Fixes a bug where fenced brokers result in NPE.

Like I mentioned in the PR description, this result in a NullPointerException.
Filtering out the fenced broker while Partition itself still includes that fenced broker in replicas can trigger the NPE as we already filter them out...

I'm not quite sure why we filtered out fenced brokers in the first place—was there a reason for doing so?

I'm not sure, but I'm a bit confused — shouldn't the partition leader broker not be a fenced broker? In KIP-841 has following invariants

a fenced or in-controlled-shutdown replica is not eligible to be in the ISR; and

a fenced or in-controlled-shutdown replica is not eligible to become leader.

Maybe I misunderstanding something.

Pardon me, I meant "replicas", not leader.

Maybe an example will help clarify things:
Let’s say we have 3 brokers — broker0, broker1, and broker2 — and a topic called my-topic. Partition 0 of this topic has 3 replicas, one on each broker. Now suppose broker2 unexpectedly shuts down.

Here’s what happens next:

DynamicTopicClusterQuotaPublisher#onMetadataUpdate gets triggered

That leads to MetadataCache#toCluster being called

In toCluster, nodesById gets constructed without the fenced broker (broker2 is filtered out)

Then MetadataCache#toArray uses this nodesById, but in the TopicsImage the PartitionRegistration (my-topic partition-0) still has replicas [0, 1, 2]. Since broker2 isn’t in nodesById, we get a NullPointerException, kaboom!

KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo

a8c1b7b

github-actions bot added core Kafka Broker kraft clients labels May 24, 2025

brandboat requested a review from chia7712 May 24, 2025 16:44

m1a2st reviewed May 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803

KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803

Uh oh!

brandboat commented May 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

m1a2st left a comment

Uh oh!

m1a2st May 25, 2025

Uh oh!

brandboat May 25, 2025 •

edited

Loading

Uh oh!

brandboat May 25, 2025

Uh oh!

m1a2st May 25, 2025 •

edited

Loading

Uh oh!

brandboat May 25, 2025

Uh oh!

Uh oh!

		Map<Integer, Node> nodesById = image.cluster().brokers().values().stream()
		.collect(Collectors.toMap(BrokerRegistration::id, broker -> broker.nodes().get(0)));

KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803

Are you sure you want to change the base?

KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803

Uh oh!

Conversation

brandboat commented May 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m1a2st left a comment

Choose a reason for hiding this comment

Uh oh!

m1a2st May 25, 2025

Choose a reason for hiding this comment

Uh oh!

brandboat May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandboat May 25, 2025

Choose a reason for hiding this comment

Uh oh!

m1a2st May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandboat May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandboat commented May 24, 2025 •

edited by github-actions bot

Loading

brandboat May 25, 2025 •

edited

Loading

m1a2st May 25, 2025 •

edited

Loading