-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handing dropped Events #202
Comments
There are a few ways to handle this, but the best may depend on the cause. Are you seeing this because the receiving client is slow and the transport itself is backed up preventing the router from sending outgoing messages fast enough, or does this happen due to a burst messages that fills up the outbound queue before messages are written to the transport? Generally, the way this has been handled so far is to increase the The current plan on how to handle this, where one or few clients are vary slow, any not have to increase The overflow queue could be combined with some application logic triggered when overflow happens. The queue is still necessary so that there is some place to write the message so that the router can continue to deliver to other clients, without having to wait for app logic process the message. |
The failure occurs during a burst of messages which causes high CPU loading. Both wamp router and wamp client are running on the same host, athough the communication is over a websocket. Hence We have already tuned the I like the idea of a dynamic queue that would buffer events until the slow client recovered. In our use case, after the burst the subscriber continues to operate, not knowing it has missed a number of events. If we knew the events had been dropped (e.g. slow client gets disconnected, an additional meta event), the subscriber could take some corrective action to re-stablish sync with the consumers. |
I modified our test to use a local client inside the wamp router binary and listen for wamp.session.on_join events using On a side note: I see less dropped packets when increasing the |
Unfortunately, the local transport does not have a configurable value. It is set statically here: The outbound queue implementation needs to be reexamined, since any configured outbound queue size will always be wrong from some use pattern/client/network. I am looking at using an unlimited size queue per client, something roughly based on https://github.com/gammazero/bigchan. That can be combined with an optional configurable policy to to disconnect clients that have some excessive memory usage or queue size. |
Making the queue dynamic sounds the best option as not all clients are equal and having a fixed queue size for all is a bit of an overhead, so I'm in favour of your suggestion. I agree the queue size can't be infinite and so the behaviour when a maximum is reached as to be configured. Would this behaviour be specified in A configurable |
If the maximum queue size is configurable, it should be so on the server since that value is associated with server resources. I am debating whether or not the router should drop a client that has exceeded limits, or should emit a meta event that allows a trusted admin client to do that. The former is simpler as it does not require users to implement a client for that purpose, but the latter allows the user to perform any other cleanup and notification work that may want if a client is dropped. As far as refactoring your clients, the raw socket can use unix sockets which will be more efficient, if that is an option for you. You can also change the value in a fork of this project, but as you said that is a dead end. |
For sure the limit of the queue size should be a server configuration option. However, the behaviour when the limit is reached (drop client, drop event, emit meta event) might be something a client could request. Some clients may tolerate dropped events, other clients may not. For my use case I am happy to specify the same behaviour on all clients in the realm |
I am trying to implement a reliable event delivery using the nexus Pub/Sub. I can tune the WebsocketServer.OutQueueSize to match my use case but at some point a slow client will cause messages to be dropped:
2019/09/30 15:58:45 !!! Dropped EVENT to session 1873459742953018: blocked
2019/09/30 15:58:45 !!! Dropped EVENT to session 1873459742953018: blocked
2019/09/30 15:58:45 !!! Dropped EVENT to session 1873459742953018: blocked
However, our application has no indication this has happened. From the discussion on #159 it was suggested the behaviour could be configured on the realm e.g. Disconnect slow clients. Are there any plans to pursue this approach? Alternatives include registering callbacks with the realm to implement application logic, meta data channels for discarded events
My use case consumes meta data OnJoin/OnLeave events to detect clients connected to the router. I have a repo to demonstrate a generic loss of events with an OutQueueSize=1
The text was updated successfully, but these errors were encountered: