Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2675: DefaultBinding isRunning enhancements #2838

Closed
wants to merge 1 commit into from

Conversation

sobychacko
Copy link
Contributor

  • Instead of relying on the lifecycle object's isRunning method which could possibly be synchronized and cause a contention for the lock, maintain the running state separately inside DefaultBinding via a field (similar to how it tracks the pausable state).

    Without this change, in the pollable Kafka consumer based applications, there is a possiblity for a longer wait when queyring the binding status in DefaultBinding (for example via the binding actuator endpoint) due to this aforementioned lock contention.

Resolves #2675

 - Instead of relying on the lifecycle object's isRunning method
   which could possibly be synchronized and cause a contention for
   the lock,  maintain the running state separately inside DefaultBinding
   via a field (similar to how it tracks the pausable state).

   Without this change, in the pollable Kafka consumer based applications,
   there is a possiblity for a longer wait when queyring the binding status
   in DefaultBinding (for example via the binding actuator endpoint) due to
   this aforementioned lock contention.

Resolves spring-cloud#2675
@garyrussell
Copy link
Contributor

This needs to be fixed in Spring Integration, not here.

cc/ @artembilan

@sobychacko
Copy link
Contributor Author

@garyrussell @artembilan What do you think about temporarily having this change here? If it's not worth it, we will close this PR here.

@artembilan
Copy link
Member

I'm sorry, I fail to determine what is wrong and what should be fixed in the KafkaMessageSource.
Note: in 6.2 we have moved to an internal Lock instead of synchronized on those methods.
Maybe that can fix it? Then we simply can back-port the fix to previous versions.
But again: where is a dead lock, please?
Thanks

@sobychacko
Copy link
Contributor Author

@artembilan The deadlock happens when the /binding actuator endpoint makes a call to the KafkaMessageSource#isRunning() method. Because of a timeout set there, there is a code path, that a thread may wait for the lock to release, which happens only after the timeout. This only happens for pollable consumers. See this issue for more details. #2675

@sobychacko
Copy link
Contributor Author

Closing this in lieu of spring-projects/spring-integration#8778

@sobychacko sobychacko closed this Oct 26, 2023
@garyrussell
Copy link
Contributor

I think we can just change the lifecycle synchronization to use an AtomicBoolean for running.

I am not sure why it is set to true after creating the consumer (in doReceive()) - looks like that was added by an external contributor.

@artembilan
Copy link
Member

The doReceive() is synchronized in the Spring Integration version and it is blocked waiting for result from the pollRecord() which ends up to be waiting for consumer.poll() where no partitions assigned to this consumer since the only one existing has been assigned to another one.
That's how it is dead-locking.
I think we can just narrow a synchronized/locked block in the doReceive() to exclude that pollRecord() call.

@garyrussell
Copy link
Contributor

Right; we. should also call consumer.wakeup() in stopConsumer() in case the poller is stuck here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kafka binder used with pollables makes fail the bindings actuator endpoint
3 participants