-
Notifications
You must be signed in to change notification settings - Fork 14.4k
KAFKA-19122: updateClusterMetadata receives multiples PartitionInfo #19803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brandboat for this patch, left a question.
Map<Integer, Node> nodesById = image.cluster().brokers().values().stream() | ||
.collect(Collectors.toMap(BrokerRegistration::id, broker -> broker.nodes().get(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shouldn’t we filter out the fenced broker here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes a bug where fenced brokers result in NPE.
Like I mentioned in the PR description, this result in a NullPointerException.
Filtering out the fenced broker while Partition itself still includes that fenced broker in replicas can trigger the NPE as we already filter them out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure why we filtered out fenced brokers in the first place—was there a reason for doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I'm a bit confused — shouldn't the partition leader broker not be a fenced broker? In KIP-841 has following invariants
- a fenced or in-controlled-shutdown replica is not eligible to be in the ISR; and
- a fenced or in-controlled-shutdown replica is not eligible to become leader.
Maybe I misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon me, I meant "replicas", not leader.
Maybe an example will help clarify things:
Let’s say we have 3 brokers — broker0, broker1, and broker2 — and a topic called my-topic. Partition 0 of this topic has 3 replicas, one on each broker. Now suppose broker2 unexpectedly shuts down.
Here’s what happens next:
- DynamicTopicClusterQuotaPublisher#onMetadataUpdate gets triggered
- That leads to MetadataCache#toCluster being called
- In toCluster, nodesById gets constructed without the fenced broker (broker2 is filtered out)
- Then MetadataCache#toArray uses this nodesById, but in the TopicsImage the PartitionRegistration (my-topic partition-0) still has replicas [0, 1, 2]. Since broker2 isn’t in nodesById, we get a NullPointerException, kaboom!
This patch resolves the following issues in MetadataCache#toCluster: