Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Ensure users content stream is never left closed after publication #5342

Merged

Conversation

mhsdesign
Copy link
Member

@mhsdesign mhsdesign commented Nov 4, 2024

Followup to publishing version 3: #5301

There are 3 cases currently that might lead to a closed content stream if $something went wrong during publication.

Case 1: Ensure all events are published BEFORE catchup

Otherwise, due to failures in projection or catchup-hooks the process would be immediately interrupted leaving a broken state.

For example a faulty redirect handler hook - that just listens to live events - would be called during publishing. That means the remaining part to publish is already commited and we know we still have work to do to fork the new user content stream and apply the remaining. But the catchup hook would interrupt immediately when the events were catchup'd live. We would be left with a CLOSED user content stream that contains the "same" events that went live during the rebase. Reopening would not help at that point. This is why we must ensure that all events are published BEFORE we do the first catchup.

Further implications:

  • running catchup only once should be more performant
  • we cannot refetch the current content stream version for what where previously "subcommans" (forkContentStream) but we must pass $expectedVersions around from the outside
  • we should not run constraint checks after the first yield as that would still operate on the old state. Thus all checks are combined above
  • closing the content stream will not be catch up'd directly meaning that another process will not get an error because the content stream is not writeable but because due to the version in the database being behind it will get a concurrency exception

Case 2: Reopen content stream if base workspace was written to during publication

... and a ConcurrencyException is thrown

Introduces a WorkspacePublicationDuringWritingTest parallel test (with own cr) to assert that behaviour.

Case 3: Close content stream a bit later instead of having to reopen in unexpected errors

Previously an error in extractFromEventStream because the payload was not correct and yet has to be migrated would lead to a closed content stream which is of course persisted even after fixing the events via migration.

This is still save to do, as the closeContentStream will commit the close on the FIRSTly fetched expected version.

Same guarantees, different error behaviour in rare cases.

Upgrade instructions

Review instructions

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

Otherwise, due to failures in projection or catchup-hooks the process would be immediately interrupted leaving a broken state.

For example a faulty redirect handler hook - that just listens to live events - would be called during publishing.
That means the remaining part to publish is already commited and we know we still have work to do to fork the new user content stream and apply the remaining.
But the catchup hook would interrupt immediately when the events were catchup'd live.
We would be left with a CLOSED user content stream that contains the "same" events that went live during the rebase. Reopening would not help at that point.
This is why we must ensure that all events are published BEFORE we do the first catchup.

Further implications:
- running catchup only once should be more performant
- we cannot refetch the current content stream version for what where previously "subcommans" (`forkContentStream`) but we must pass $expectedVersions around from the outside
- we should not run constraint checks after the first `yield` as that would still operate on the old state. Thus all checks are combined above
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great start and the LoC balance is negative even:
image

…ring publication

... and a ConcurrencyException is thrown

Introduces a `WorkspacePublicationDuringWritingTest` parallel test (with own cr) to assert that behaviour.
That allows us to use the same content repository. Previously a super slow paratest would lead that another testcase will already be started and its setup then run twice at the end.

paratestphp/paratest#905
… version instead

Also readd lost documentation and simplifies the `handle`

The ->throw logic was initially introduced via

neos#5315

but then removed again as we thought it was no longer needed.
…re content stream is never left closed

During the beta phase it can happen that user forget to apply a migration to migrate the stored commands in the even metadata, upon publish this would close the content stream and fail directly afterward.

Applying the migration then would not be enough as the content stream is a closed state and has to be repaired manually.

Event thought this is not super likely, its not unlikely as well and the case during publication were we rely on things that might not be that way.

As an alternative we could discuss doing the closing after acquiring the rebaseable commands.
… many edge cases

Alternative fix for d27f83f

Previously an error in `extractFromEventStream` because the payload was not correct and yet has to be migrated would lead to a closed content stream which is of course persisted even after fixing the events via migration.

This is still save to do, as the `closeContentStream` will commit the close on the FIRSTly fetched expected version.

Same guarantees, different error behaviour in rare cases.
@mhsdesign mhsdesign marked this pull request as ready for review November 9, 2024 20:20
@mhsdesign mhsdesign changed the title BUGFIX: Ensure all events are published BEFORE catchup BUGFIX: Ensure users content stream is never left closed after publication Nov 9, 2024
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some, probably semi-helpful, comments. Looks great otherwise

@mhsdesign mhsdesign requested a review from bwaidelich November 10, 2024 08:30
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 by reading and running the tests

Copy link
Contributor

@dlubitz dlubitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good by reading. But as a disclaimer ... I'm not that deep into this topic to fully understand. 🤷‍♂️

@kitsunet kitsunet merged commit 80d8750 into neos:9.0 Nov 12, 2024
9 checks passed
@mhsdesign mhsdesign deleted the bugfix/publishing-ensure-contentstream-not-closed branch November 12, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants