rfc: optimize and reduce traffic from delivery workers #970

epoberezkin · 2024-01-17T23:31:44Z

No description provided.

spaced4ndy · 2024-01-18T05:46:07Z

rfcs/2024-01-17-delivery-workers.md

+
+As the recent changes in message delivery moved the message delivery queues to the database, when the worker reads the next message ID to deliver from the database (and not only its data, as before), we could optimize it further by having a single worker per transport session that would treat quota exceeded errors differently from network and timeout errors.
+
+When queue quota is exceeded, the sending queue will be marked with `deliver_after` timestamp, and messages that need to be delivered to this queue would not be read by the worker until that timestamp is reached. When network or timeout error happens, the message will be retried up to N times (probably 3 to 5 to avoid the risk of message-specific errors being misinterpreted as network errors), after that the whole queue will be marked with `deliver_after` timestamp with smaller interval than in case of quota exceeded error but larger than between retries of the single message. Exponential back off would still be used.


how is deliver_after to be treated after client restart? ignore and attempt once, preserving retry interval?

it may be better to separate fields for quota exceeded and network error queues in case these situations require different approach

spaced4ndy · 2024-01-18T05:56:40Z

rfcs/2024-01-17-delivery-workers.md

+
+## Solution
+
+As the recent changes in message delivery moved the message delivery queues to the database, when the worker reads the next message ID to deliver from the database (and not only its data, as before), we could optimize it further by having a single worker per transport session that would treat quota exceeded errors differently from network and timeout errors.


possibly alternatively could keep workers but block them at session client, e.g. it may process 1 message at a time, managing session network retry interval; quota exceeded workers may cooperate and keep their own retry interval.

May be a smaller change in terms of affecting good queues delivery (it being stuck due to some newly introduced bug).

Also if all session workers "work" to be read by a single worker, query work may have more special filters (e.g. account for user in session). Maybe it's not a problem.

Per-transport workers would make using (per-transport) proxies simple.

I am less concerned about complicating database logic than this change would substantially reduce sending concurrency, slowing down actual sending to groups, making problem of out of order delivery worse.

On another hand, possibly the current concurrency level creates too many timed-out deliveries, that result in duplicate deliveries, increasing the traffic both for senders and for recipients.

So the first steps would be to reduce traffic and battery usage:

implement expiration for quota exceeded messages (possibly with longer timeout of 1 week) that is now much simpler than when the queue was in memory, and is a smaller change, compared with this rfc.

while making one or several delivery attempts after restart, expire not just one but all messages that need to be expired, without trying to deliver them.

add statistics to SMP server to count duplicate deliveries, by comparing message hashes (or signatures).

rfc: optimize and reduce traffic from delivery workers

b2a1193

epoberezkin requested a review from spaced4ndy as a code owner January 17, 2024 23:31

spaced4ndy reviewed Jan 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: optimize and reduce traffic from delivery workers #970

rfc: optimize and reduce traffic from delivery workers #970

epoberezkin commented Jan 17, 2024

spaced4ndy Jan 18, 2024

spaced4ndy Jan 18, 2024

dpwiz Jan 18, 2024

epoberezkin Jan 18, 2024


		As the recent changes in message delivery moved the message delivery queues to the database, when the worker reads the next message ID to deliver from the database (and not only its data, as before), we could optimize it further by having a single worker per transport session that would treat quota exceeded errors differently from network and timeout errors.

		When queue quota is exceeded, the sending queue will be marked with `deliver_after` timestamp, and messages that need to be delivered to this queue would not be read by the worker until that timestamp is reached. When network or timeout error happens, the message will be retried up to N times (probably 3 to 5 to avoid the risk of message-specific errors being misinterpreted as network errors), after that the whole queue will be marked with `deliver_after` timestamp with smaller interval than in case of quota exceeded error but larger than between retries of the single message. Exponential back off would still be used.


		## Solution

		As the recent changes in message delivery moved the message delivery queues to the database, when the worker reads the next message ID to deliver from the database (and not only its data, as before), we could optimize it further by having a single worker per transport session that would treat quota exceeded errors differently from network and timeout errors.

rfc: optimize and reduce traffic from delivery workers #970

Are you sure you want to change the base?

rfc: optimize and reduce traffic from delivery workers #970

Conversation

epoberezkin commented Jan 17, 2024

spaced4ndy Jan 18, 2024

Choose a reason for hiding this comment

spaced4ndy Jan 18, 2024

Choose a reason for hiding this comment

dpwiz Jan 18, 2024

Choose a reason for hiding this comment

epoberezkin Jan 18, 2024

Choose a reason for hiding this comment