Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDP-1913]Reduce metadata refresh interval #76

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

tkornai
Copy link
Contributor

@tkornai tkornai commented Oct 30, 2023

Context

According to https://wise.slack.com/archives/G01P8RBLGCC/p1693400446185769

The goal with this change is to provide more time for https://github.com/transferwise/kafka-health-checker to demote unhealthy brokers. We assume that faulty broker is in a zombie state, so it won't return PARTITION_MIGRATED exception that would force the metadata update. Default metadata.max.age.ms is 5 min. Producer’s delivery timeout is 7 min. Let’s say trouble starts at 13:01, health checker reacts to this and demotes the broker at 13:04, if Kafka client’s metadata was refreshed at 13:03, then next metadata refresh will be at 13:08, by that time we would already hit delivery timeout, which would be at 13:08. If producers producing to a changelog topic fail, then it forces the whole Kafka streams task to migrate to another instance, hence the rebalancing. Problem will be that exception will be thrown when produce fails within the delivery timeout, leading the Kafka Streams thread to be moved to another instance. Producer metadata is refreshed periodically, or if there’re certain exceptions returned by the broker.

Checklist

Details from ticket: RDP-1913

Reduce metadata refresh interval for Service Kafka clients

When we implement the {{kafka-health-checker}} features for the Service Kafka cluster, we should also make sure that clients set {{metadata.max.age.ms=120000}} (two minutes) to reduce impact time.

@tkornai tkornai requested a review from a team as a code owner October 30, 2023 15:32
@tkornai tkornai merged commit 3146f9b into master Nov 3, 2023
17 checks passed
@tkornai tkornai deleted the rdp-1913-reduce-metadata-refresh-interval branch November 3, 2023 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants