Separate kafka producers for each proxy #78

onukristo · 2023-12-15T08:51:20Z

Context

Wise had Kafka producer flush hanged when service had troubles with direct buffers limits. This made it hard to debug what is going on. We could have preferred to get at least some kind of error message.

Removed

Support for Spring Boot 2.6 .

Changed

Every proxy has its own, independent, Kafka producer.
Before, one producer was shared by all partitions. And, the default shard's producer was also used for topics validation.
Kafka producer's flush will be now interrupted from another thread, by a separate housekeeping service.
Wise had an incident, where the flush() call hanged forever, and it was not easy to derive that this is the case.
Now we will at least get clear error logs, when this happens.
Proxies' Kafka producers will be closed after the poll loop exits.
This would allow to recover from unforeseen kafka clients' bugs and also release resources when another pod takes over the proxying.
The default linger time on kafka producer was increased from 5 ms. to 1000 ms.
This would allow potentially larger batches to get formed. We are not increasing the latency, because we override the
lingering mechanism via flush call anyway.

Checklist

Change meets or does not compromise the Baseline Security Requirements

onukristo · 2023-12-15T08:52:44Z

build.libraries.gradle

            jakartaValidationApi            : 'jakarta.validation:jakarta.validation-api:3.0.2',
            javaxValidationApi              : "javax.validation:validation-api:2.0.1.Final",
-            kafkaStreams                    : 'org.apache.kafka:kafka-streams:3.4.0',
+            kafkaStreams                    : 'org.apache.kafka:kafka-streams:3.2.3',


Syncing with WJP 2.7

tkornai · 2023-12-18T10:16:47Z

CHANGELOG.md

+
+### Removed
+
+- Support for Spring Boot 2.6 .


setup.md states that Spring Boot 2.5 is supported.

tkornai · 2023-12-18T11:53:41Z

CHANGELOG.md

+  This would allow to recover from unforeseen kafka clients' bugs and also release resources when another pod takes over the proxying.
+
+- The default linger time on kafka producer was increased from 5 ms. to 1000 ms.
+  This would allow potentially larger batches to get formed. We are not increasing the latency, because we override the


This sounds contradictory: we either let messages linger around a bit longer and then the latency is increased a bit or we don't. I think some further explanation could be useful here.

Maybe just add the word "substantially" after latency.

tkornai · 2023-12-18T14:07:37Z

tw-tkms-starter/src/main/java/com/transferwise/kafka/tkms/TkmsStorageToKafkaProxy.java

+              } catch (InterruptException e) {
+                log.error("Kafka producer was interrupted for " + shardPartition + ".", e);
+                // Rethrow and force the recreation of the producer.
+                throw e;


Why do we handle this case differently from the one below?

Yep, we are logging down a trackable message.

And we force the poll loop to exit, and thus, a new producer instance to be created.

tkornai · 2023-12-18T14:16:57Z

tw-tkms-starter/src/main/java/com/transferwise/kafka/tkms/TransactionalKafkaMessageSender.java

@@ -71,7 +71,7 @@ public void afterPropertiesSet() {
    environmentValidator.validate();

    for (String topic : properties.getTopics()) {
-      validateTopic(properties.getDefaultShard(), topic);
+      validateTopic(topic);


I'm afraid we might have an edge case here when the topic exists, the service has DESCRIBE ACL on it, but it does not have WRITE. Topic validation will pass but produce requests will fail.

I think we should implement topic validation via AdminClient instead.

Currently if we ask for a not existing topic, the metadata logs about it will get spammed until the producer is closed.

tkornai · 2023-12-18T14:24:53Z

tw-tkms-starter/src/main/java/com/transferwise/kafka/tkms/config/TkmsKafkaProducerProvider.java

I was wondering if we use compression for message sending too or only for DB writes?

The compression is enabled for only db writes.
However, we should enable compression for sends as well, but already via a separate PR.

tw-peeterkarolin · 2023-12-19T07:40:55Z

tw-tkms-starter/src/main/java/com/transferwise/kafka/tkms/TkmsInterrupterService.java

+  private IExecutorServicesProvider executorServicesProvider;
+  private ScheduledTaskExecutor scheduledTaskExecutor;
+
+  public void afterPropertiesSet() {


Can ScheduledTaskExecutor be already passed from the constructor as we should have access to it in TkmsConfiguration?

Currently no, currently only the IExecutorServicesProvider bean is available.

Storage

1705649

onukristo added the change:standard Not an emergency or impactful change label Dec 15, 2023

onukristo requested a review from a team as a code owner December 15, 2023 08:51

onukristo commented Dec 15, 2023

View reviewed changes

onukristo added 7 commits December 15, 2023 10:56

Fix.

5c34bd5

Fix.

51c7bcf

Fix.

86f3691

Fix.

82d94a4

Fix.

c208ab3

Fix.

dc041a7

Fix.

f6a7cfc

tkornai previously approved these changes Dec 18, 2023

View reviewed changes

Fix.

81eb6c6

onukristo dismissed tkornai’s stale review via 81eb6c6 December 18, 2023 14:47

tkornai approved these changes Dec 18, 2023

View reviewed changes

tw-peeterkarolin reviewed Dec 19, 2023

View reviewed changes

tw-peeterkarolin approved these changes Dec 19, 2023

View reviewed changes

onukristo merged commit b6cb80a into master Dec 19, 2023
18 checks passed

onukristo deleted the remove_producer_flush branch December 19, 2023 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate kafka producers for each proxy #78

Separate kafka producers for each proxy #78

onukristo commented Dec 15, 2023 •

edited

Loading

onukristo Dec 15, 2023

tkornai Dec 18, 2023

tkornai Dec 18, 2023

tkornai Dec 18, 2023

tkornai Dec 18, 2023

onukristo Dec 18, 2023

tkornai Dec 18, 2023

onukristo Dec 18, 2023

tkornai Dec 18, 2023

onukristo Dec 18, 2023 •

edited

Loading

tw-peeterkarolin Dec 19, 2023

onukristo Dec 19, 2023

Separate kafka producers for each proxy #78

Separate kafka producers for each proxy #78

Conversation

onukristo commented Dec 15, 2023 • edited Loading

Context

Removed

Changed

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onukristo Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onukristo commented Dec 15, 2023 •

edited

Loading

onukristo Dec 18, 2023 •

edited

Loading