Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-8778: Fix KafkaMessageSource deadlock #8780

Merged
merged 2 commits into from
Oct 26, 2023

Conversation

artembilan
Copy link
Member

Fixes #8778

The KafkaMessageSource.doReceive() have a lock around its whole body. That includes the pollRecord() which can be blocked on the KafkaConsumer.poll(). This way the rest of lifecycle management callbacks can be blocked until KafkaConsumer.poll() returns.

  • Rework lifecycle management flags to AtomicBoolean since there is not too much work in their respective callbacks
  • Decrease a locking block in the doReceive() just to consumer setup part. Leave pollRecord() outside of the lock
  • Add this.consumer.wakeup() into stopConsumer() to break a poll() cycle and return immediately for the next close() call

Cherry-pick to 6.1.x & 6.0.x

Fixes spring-projects#8778

The `KafkaMessageSource.doReceive()` have a lock around its whole body.
That includes the `pollRecord()` which can be blocked on the `KafkaConsumer.poll()`.
This way the rest of lifecycle management callbacks can be blocked until `KafkaConsumer.poll()` returns.

* Rework lifecycle management flags to `AtomicBoolean` since there is not too much work
in their respective callbacks
* Decrease a locking block in the `doReceive()` just to consumer setup part.
Leave `pollRecord()` outside of the lock
* Add `this.consumer.wakeup()` into `stopConsumer()` to break a `poll()` cycle
and return immediately for the next `close()` call

**Cherry-pick to `6.1.x` & `6.0.x`**
@artembilan
Copy link
Member Author

CC @sobychacko

@sobychacko
Copy link
Contributor

LGTM, @artembilan

Copy link
Contributor

@garyrussell garyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't cherry-pick because of the ReentrantLock.

finally {
this.lock.unlock();
}
stopConsumer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (this.running.compareAndSet(true, false)) {...}

finally {
this.lock.unlock();
}
this.running.set(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (this.running.compareAndSet(false, true)) {...}

@artembilan
Copy link
Member Author

This won't cherry-pick because of the ReentrantLock.

I have back-ported today such a fix: 991d29e
Exactly for a simple review and back-port process of this PR.

@garyrussell garyrussell merged commit f0561b6 into spring-projects:main Oct 26, 2023
1 check passed
garyrussell pushed a commit that referenced this pull request Oct 26, 2023
* GH-8778: Fix KafkaMessageSource deadlock

Fixes #8778

The `KafkaMessageSource.doReceive()` have a lock around its whole body.
That includes the `pollRecord()` which can be blocked on the `KafkaConsumer.poll()`.
This way the rest of lifecycle management callbacks can be blocked until `KafkaConsumer.poll()` returns.

* Rework lifecycle management flags to `AtomicBoolean` since there is not too much work
in their respective callbacks
* Decrease a locking block in the `doReceive()` just to consumer setup part.
Leave `pollRecord()` outside of the lock
* Add `this.consumer.wakeup()` into `stopConsumer()` to break a `poll()` cycle
and return immediately for the next `close()` call

**Cherry-pick to `6.1.x` & `6.0.x`**

* * Use `compareAndSet` in `start` & `stop`
garyrussell pushed a commit that referenced this pull request Oct 26, 2023
* GH-8778: Fix KafkaMessageSource deadlock

Fixes #8778

The `KafkaMessageSource.doReceive()` have a lock around its whole body.
That includes the `pollRecord()` which can be blocked on the `KafkaConsumer.poll()`.
This way the rest of lifecycle management callbacks can be blocked until `KafkaConsumer.poll()` returns.

* Rework lifecycle management flags to `AtomicBoolean` since there is not too much work
in their respective callbacks
* Decrease a locking block in the `doReceive()` just to consumer setup part.
Leave `pollRecord()` outside of the lock
* Add `this.consumer.wakeup()` into `stopConsumer()` to break a `poll()` cycle
and return immediately for the next `close()` call

**Cherry-pick to `6.1.x` & `6.0.x`**

* * Use `compareAndSet` in `start` & `stop`
@garyrussell
Copy link
Contributor

...and cherry-picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in KafkaMessageSource - between isRunning and doReceive
3 participants