Provide strategy for dealing with undeliverable messages #18

fml2 · 2022-04-28T15:04:39Z

fml2
Apr 28, 2022

This is a very promising library! What I'm concerned with: What happens to messages that are ignored or can't be delivered within the specified amount of time? Shouldn't the application be able to provide the code to deal with such cases? E.g. in the case of a message not being correlated for too long it could be moved to a "dead letter queue" (or similar). The application should be able to somehow react to this.

Or did I miss something and this is already possible within the lib?

zambrovski · 2022-04-28T15:27:36Z

zambrovski
Apr 28, 2022
Maintainer

It is kind-of possible, but let discuss further on this topic...
So there is a SingleMessageErrorHandlingStrategy which decides what to do with a message on an error...

The one that is there now is RetryingSingleMessageErrorHandlingStrategy, but we could add more properties to the RetryingErrorHandlingConfig like message dropping ...

0 replies

fml2 · 2022-04-28T16:48:46Z

fml2
Apr 28, 2022
Author

I mean, the retry strategy could be "try to deliver the message N times (with some interval strategy) and then, if it still could not be delivered, feed a special object with it". We don't want the messages to stay forever in the buffer, but I think, we also should be able to be notified about every case that could not be handled properly. In a particular application, this might not be necessary, but a lib should allow for this IMO.

0 replies

zambrovski · 2022-04-28T16:58:22Z

zambrovski
Apr 28, 2022
Maintainer

I believe we could build this... Let's think how the API of this should look like...
So first of all, currently the strategy keeps retrying until the configured maximum of retries is reached and then stops retrying... This is just one option, but we could make it switchable what to do if the maximum number of retries has been reached.

The option stop would stop trying.
The option recover could do something else...

To make sure we speak about the same issue - it is not about detection of this fact - there is a metrics on this, and there will be more metrics saying you that the particular message hit the max number of retries... It is about "self healing", right?

3 replies

zambrovski Apr 28, 2022
Maintainer

So feel free to sketch the interface of the component we are speaking about... So recover is maybe wrong - maybe dead letter is a nicer term. But what would you aspect? a Consumer<CorrelateMessage> provided by the customer? We could supply a default one just logging it, and you could provide your own, dealing with this message.

The semantics would be to remove this one particular message from the inbox and pass it to the provided component?
Or should we mark the entire "batch" as dead-letter and pass the entire batch to the consumer?

What do you think?

fml2 Apr 28, 2022
Author

The semantics would be to remove this one particular message from the inbox and pass it to the provided component?

Yes, exactly that. The component could be named undeliverableMessageProcessor (or ...Consumer). It would be called when the message is about to be removed from the storage (i.e. when the retry strategy has given up to deliver the message). If this component processes a message without throwing an exception then the message is removed from the storage. I'm not quite sure what should happen if an exception occurs here. It never should (this should be the conract) but you probably know that things that should not happen still happen.

IMO this should be done not for the batch but for every single message that gets so far. I still have to understand the concept of the batch properly. As of now, I don't understand its purpose other than the performance improvement.

There are other things I have not understood yet (I've learnt the lib just today), but I think this would be a sensible addition.

zambrovski Apr 28, 2022
Maintainer

Feel free to ask question, I'll try to answer here first and document it in more detail in the reference guide...

zambrovski · 2022-04-28T20:16:04Z

zambrovski
Apr 28, 2022
Maintainer

As for the batch it is not about performance at all, it is about ordering. There are two requirements that compete with each other… 1. You want to process messages as they come and don‘t block on errors if the messages are targeting different workflows. 2. You want to stop on first error in a sequence of messages targeting the same workflow, since the order of messages might be crucial.. So what we do is to select messages from the store and determine the target workflow (referenced as Correlation Hint in the library). Then we group by the correlation hint and get batches of messages targeting the same workflow, and then we apply a sorter to create an order of messages inside a batch. It might be based on receiving time (actually a bad choice), but you can supply your own…). And then we try to correlate the batch by correlating the messages out of it.. if the error occurs, we proceed with the single message based on Single message Error startegy (drop, ignore, retry) and with the whole batch based on the batch correlation mode (all or fail_first) … Somehow more clear?

…

Am 28.04.2022 um 21:49 schrieb fml2 ***@***.***>: The semantics would be to remove this one particular message from the inbox and pass it to the provided component? Yes, exactly that. The component could be named undeliverableMessageProcessor (or ...Consumer). It would be called when the message is about to be removed from the storage (i.e. when the retry strategy has given up to deliver the message). If this component processes a message without throwing an exception then the message is removed from the storage. I'm not quite sure what should happen if an exception occurs here. It never should (this should be the conract) but you probably know that things that should not happen still happen. IMO this should be done not for the batch but for every single message that gets so far. I still have to understand the concept of the batch properly. As of now, I don't understand its purpose other than the performance improvement. There are other things I have not understood yet (I've learnt the lib just today), but I think this would be a sensible addition. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

1 reply

fml2 Apr 28, 2022
Author

Yes, the idea is more clear now but I still don't think it's really needed. My understanding of the main purpose of the lib is that it helps to cope with situations where a message arrives before the process has created a subscription to receive it. Camunda 8 deals out of the box with it (although this lib does it better IMO :-) ). So we have to somehow store the messages and try to deliver them to the target processes in the hope that they will be eventually succesfully delivered.

Since we can't, in general, predict or control the order in which messages arrive (they are asynchronous for the target process), the whole thing should eventually work regardless of the order the buffered messages are processed. If it does not, then the process design is flawed. That's why I think that batching might only improve the performance. But it should not be responsible for the correctness.

fml2 · 2022-04-29T07:56:19Z

fml2
Apr 29, 2022
Author

Hrm... Ater a thought I probably have to take back the idea with a special entity for dealing with undeliverable messages. This can be integrated into the retry strategy. My first idea was to make it a separate concept in the lib, but if we say that it's the responsibilty of the delivery strategy then this feature is already covered. I.e. a special retry strategy could work like this: try N times and then put the message into the dead letter queue.

Of course, the last step could be factored out and made pluggable. But it's then an implementation detail.

The statements about the batch (correctness vs. performance) still hold.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide strategy for dealing with undeliverable messages #18

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Provide strategy for dealing with undeliverable messages #18

fml2 Apr 28, 2022

Replies: 5 comments · 4 replies

zambrovski Apr 28, 2022 Maintainer

fml2 Apr 28, 2022 Author

zambrovski Apr 28, 2022 Maintainer

zambrovski Apr 28, 2022 Maintainer

fml2 Apr 28, 2022 Author

zambrovski Apr 28, 2022 Maintainer

zambrovski Apr 28, 2022 Maintainer

fml2 Apr 28, 2022 Author

fml2 Apr 29, 2022 Author

fml2
Apr 28, 2022

Replies: 5 comments 4 replies

zambrovski
Apr 28, 2022
Maintainer

fml2
Apr 28, 2022
Author

zambrovski
Apr 28, 2022
Maintainer

zambrovski Apr 28, 2022
Maintainer

fml2 Apr 28, 2022
Author

zambrovski Apr 28, 2022
Maintainer

zambrovski
Apr 28, 2022
Maintainer

fml2 Apr 28, 2022
Author

fml2
Apr 29, 2022
Author