Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution of SessionFsm transition actions can create a deadlock #1300

Closed
persicsb opened this issue Aug 26, 2024 · 5 comments · Fixed by #1348
Closed

Execution of SessionFsm transition actions can create a deadlock #1300

persicsb opened this issue Aug 26, 2024 · 5 comments · Fixed by #1348

Comments

@persicsb
Copy link

Describe the bug
SessionFsm transition actions are not properly executed in order, and it can cause a deadlock.

The main use case is resubscribing subscriptions, that were failed to transfer to a new session.
There are 2 main points on the client side, when we are notified to do it:

  1. SessionActivityListener.onSessionActive - it is assumed that we are safe to work with anything session related, lik subscribing to nodes
  2. ManagedSubscription.StatusListener.onSubscriptionTransferFailed - it is assumed here, that everything can be done with the SubscriptionManager here.

However, this is not the case, we can be in a deadlock easily.

The main problem is, that onSessionActive and onSubscriptionTransferFailed callbacks can run before the session's SessionFuture (with the key KEY_SESSION_FUTURE) is properly resolved.

The reason is, that the SessionFsm is set up in SessionFsmFactory so that the session's future is only resolved when the InitializeSuccess event is called between the Intializing and Active state transition, see https://github.com/eclipse/milo/blob/master/opc-ua-sdk/sdk-client/src/main/java/org/eclipse/milo/opcua/sdk/client/session/SessionFsmFactory.java#L513

It resolves the session's future via the executorservice, and not directly, see line 565.

However, the onSessionActive callbacks are called directly, when the state changes to Active from any other state, in https://github.com/eclipse/milo/blob/master/opc-ua-sdk/sdk-client/src/main/java/org/eclipse/milo/opcua/sdk/client/session/SessionFsmFactory.java#L587

Since these callse are not made through the client's executorservice, they can happen-before the sessionFuture is resolved.

So there is no guarantee, that even after onSessionActive is called, anything can be done, high-level APIs like ManagedSubscription.create() will hang indefinitely, as they are in a deadlock : they are waiting for the future's resolution, but that is scheduled later.

I think, the issue is the same as #614

In that issue, you mentioned that:

I'll start looking at whether I can make these callbacks safer. In the meantime, when you're executing inside a callback, whether from the subscription manager or just a CompletableFuture for a response, just make sure not to block waiting for another response on the same thread.

What is the preferred way to solve this issue? When and how can ManagedSubscription.create() (and any reentrant calls to getSession()) safely called?

Also, it needs to be documented properly in the Javadocs, and perhaps in the milo-examples code.

Expected behavior
SessionFsm transition actions have a well-defined order.

Additional context
Milo version 0.6.14, but affects all others.

@kevinherron
Copy link
Contributor

This is still a problem that I need to figure out how to solve nicely.

I don't remember if I ever looked into moving the onSessionActive and onSessionInactive callbacks to the client executor, that would certainly be an easy fix for that specific scenario if it's safe.

In our client implementation using Milo we don't use any callbacks to determine when to start creating subscriptions. You "know" it's safe to create a Subscription if your call to OpcUaClient::connect or OpcUaClient::getSession succeeds.

@persicsb
Copy link
Author

persicsb commented Sep 3, 2024

But getSession() may behave as it never returns (that is the deadlock).

We are using Siemens S7-1500 PLCs, that do not implement the subscription transfer service, so onSubscriptionTransferFailed is called for every subscription to those PLCs.

However, sometimes (as it really depends on the scheduling of the transition actions, and not deterministic) getSession() remains stuck, and its future will never be resolved.

It can happen, that the connection and reconnection/resubscriptions works for months, and after that, it deadlocks every day. It is really not deterministic.

The real problem is, that a possible workaround might be to create a new client to the PLCs with a fresh connection.
But it would be nice to close the old connection, since the PLCs have a limited amount of resource to serve the OPC UA clients.
But if getSession() deadlock, disconnect() deadlocks as well, because internally it calls sessionFsm.closeSession(), that won't execute properly.

This is a really annoying issue, which makes Milo quite unusable in environments, where PLCs without subscription transfer restart often.

kevinherron added a commit that referenced this issue Nov 28, 2024
This avoids deadlock when the user makes a blocking call that requires the Session from the SessionFsm.

fixes #1300
@persicsb
Copy link
Author

Thank you, Kevin, this is wonderful. Can Milo users expect a new 0.6.15 release or shall we wait until 1.0?

@kevinherron
Copy link
Contributor

@persicsb I will cut a 0.6.15 release within a week, there are a few other small fixes that have gone into 0.6.x in addition to this one.

@kevinherron
Copy link
Contributor

@persicsb 0.6.15 should be available in Maven Central since yesterday.

kalidali256 pushed a commit to kalidali256/milo that referenced this issue Jan 26, 2025
)

This prevents deadlock when the user makes a blocking call that requires the Session from the SessionFsm.

fixes eclipse-milo#1300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@kevinherron @persicsb and others