Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-35465 Async replication stops working on Galera async replica no… #3655

Closed
wants to merge 1 commit into from

Conversation

temeo
Copy link
Contributor

@temeo temeo commented Nov 22, 2024

…de when parallel replication is enabled

  • The Jira issue number for this PR is: MDEV-35465

Description

Parallel slave failed to retry in retry_event_group() with error

WSREP: Parallel slave worker failed at wsrep_before_command() hook

Fix wsrep transaction cleanup/restart in retry_event_group() to properly clean up previous transaction by calling wsrep_after_statement(). Also move call to reset error after call to wsrep_after_statement() to make sure that it remains effective.

Add a MTR test galera_as_slave_parallel_retry to reproduce the error when the fix is not present.

Other issues which were detected when testing with sysbench:

Check if parallel slave is killed for retry before waiting for prior commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This is required with slave-parallel-mode=optimistic to avoid deadlock when a slave later in commit order manages to reach prepare phase before a lock conflict is detected.

Suppress wsrep applier specific warning for slave threads.

How can this PR be tested?

This PR contains a new test galera.galera_as_slave_parallel_retry which was used to reproduce the issue. The test must pass when the fix is present.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…de when parallel replication is enabled

Parallel slave failed to retry in retry_event_group() with error

    WSREP: Parallel slave worker failed at wsrep_before_command() hook

Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.

Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.

Other issues which were detected when testing with sysbench:

Check if parallel slave is killed for retry before waiting for prior
commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This
is required with slave-parallel-mode=optimistic to avoid deadlock
when a slave later in commit order manages to reach prepare phase
before a lock conflict is detected.

Suppress wsrep applier specific warning for slave threads.
@sysprg
Copy link
Contributor

sysprg commented Dec 3, 2024

Thanks, the fix has been merged with the main branch:
a2575a0

@sysprg sysprg closed this Dec 3, 2024
@temeo temeo deleted the 10.5-MDEV-35465 branch December 3, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Codership Codership Galera
Development

Successfully merging this pull request may close these issues.

4 participants