Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: smp server cluster #1422

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

rfc: smp server cluster #1422

wants to merge 1 commit into from

Conversation

epoberezkin
Copy link
Member

No description provided.


## Problem

Currently we can only scale servers on a given address vertically, which has 2 problems:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently we can only scale servers on a given address vertically, which has 2 problems:
Currently we can only scale servers on a given address vertically, which has 3 problems:


The second approach makes it easy to migrate parts of the state between servers in the cluster, as message queues are already grouped in folder with the top level having 2 letters in folder name. This would allow to have up to 4096 servers in the cluster.

The proxy would then choose a random server from the list of servers to create a queue, and the server would have to be configured to use specific 2 letters in base64 encoding of queue addresses. For existing queues, the server will be choosed based on the queue ID.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the server would have to be configured to use specific 2 letters

probably should be configured to select from range of 2 letters, and not specific 2 letters. this way when new server is added, queues transferred from this server can be split more evenly by further splitting ranges. I think Cassandra does something similar to that..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +20 to +22
To group servers into a cluster we need to map requests to specific servers. This can be done in one of two ways:
- additional server ID in the cluster added to transmissions.
- map the first two letters in base64 encoding of queue ID to the server ID.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider advantages and implementation of first approach

- additional protocol commands used only by load-balancing proxy to create references to servers that have the actual queue from other IDs.
- use the same 2 letters for all IDs.

The latter approach is simpler, but it it cannot be used if some of the IDs are generated client-side and some IDs are generated server-side - we would need to generate all IDs in one place. Alternatively, the client can generate some IDs with the same 2 letters in the ID, and the server in the cluster will be chosen to match this ID.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, how existing queues will migrate


If we decide on the second approach, and add client-generated IDs, we already may start rejecting IDs that contain different first 2 letters. It would effectively reduce ID entropy from 192 to 180 bits which could be a better tradeoff than additional protocol commands and requests to find the queue, that would add to the request latency.

The advantage of the first approach is that it is more generic, and does not impose any restriction on IDs, and making additional requests within the operators network would add a small fraction to the latency, compared with much larger latency to the end user. The balancing proxy could cache the results of dereferencing requests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also second approach leaks some metadata possibly..? for example if sender and notification server collude.. basically they'd know they're referring to same node in cluster. not sure, seems far fetched.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall I think approach 2 (for question "A separate question is how to map other queue IDs (sender, notifier, link) to the recipient ID", not necessarily overall) is better. Seems reasonable for cluster to know its state

Comment on lines +43 to +46
- proxy -> server 2 chosen based on sender ID: SET_SREF
- proxy <- server 2: OK
- proxy -> server 3 chosen based on notifier ID: SET_NREF
- proxy <- server 3: OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do servers 2 and 3 need to know anything here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in - it's proxy that's responsible for routing / load balancing, or does this sequence imply servers aren't yet in the same cluster?.. not sure I understand

Comment on lines +64 to +71
**The sequence of requests to send the message**:

- client -> proxy: SEND
- proxy -> mapped server 1 based on sender ID: GET_SREF
- proxy <- server 1: SREF
- proxy -> mapped server 2 based on recipient ID (or cluster ID) in SREF: SEND
- proxy <- server 2: OK
- client <- proxy: OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also don't get it. aren't these servers in same cluster and behind the same proxy address? why server 1 (and not proxy) has to know sender id for queue that's on server 2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants