Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] Migrations backwards compatibility #1189

Closed
ewjoachim opened this issue Sep 8, 2024 · 8 comments
Closed

[v3] Migrations backwards compatibility #1189

ewjoachim opened this issue Sep 8, 2024 · 8 comments
Milestone

Comments

@ewjoachim
Copy link
Member

ewjoachim commented Sep 8, 2024

According to our own doc, migrations named with version X are the migrations you can apply once you run on version X.
According to our doc, people on v2.13 wishing to upgrade without disruption would:

  • Change the code to v3.0
  • Apply the v3.0 migrations
  • Change the code to v3.1
  • Apply the v3.1 migrations
    All the while their instance is running and processing jobs.

Compatibility matrix:

Code ➡️
⬇️ Schema
v2.13 v3.0 v3.1
v2.13
v3.0
v3.1

❌ means we don't have to make it compatible, but if it happens to be, it's not a problem, of course

We need to decide what we put in v3.0 and v3.1

Note: the change of status = aborting -> abort_requested might need a couple of trigger just for the time of the dual compatibility (adding the `abort_requested column and update its value based on the status, then add a trigger that synchronizes the 2 columns, and in the next release, delete the triggers and the column)

@ewjoachim ewjoachim added this to the Version 3.0 milestone Sep 8, 2024
@medihack
Copy link
Member

I must confess that I get totally confused by naming the migrations now. In the contributing docs we say "xx.yy.zz is the number of the latest released version of Procrastinate" and in the migrations docs "you want to update to Procrastinate 1.15.0 ... migration scripts whose versions are greater than or equal to 1.9.0, and lower than 1.15.0". But then in the above table and #1186 you name the migration 03.00.00_01_cancel_notification.sql which won't be lower when we include this migration in a 3.0 release.
Unfortunately, I also don't fully understand the "Note". Have I missed something in my migration?

@ewjoachim
Copy link
Member Author

It is confusing :(
When we wrote procrastinate, we were in an environment where we couldn't stop the service while applying migrations, so we had to ensure we created compatible migrations, that there would always be a path where you upgrade code, then db, then code, then db etc, without service interruption. As of today this is still an official commitment procrastinate does. We can revisit this but I believe it's worth trying to keep it.

For v3, you're right that the move I make is probably wrong but the migration was named for 2.9.2 which is also wrong. The reality is that:

  • Either when we're ready to make v3, we make one last v2 that adds stuff, and then in the v3 migrations we remove stuff
  • Or we'll do a v3.0 that does almost no change in the code (so almost identical to v2.x) but introduces DB changes and a v3.1 that will introduce everything we've added to the v3 branch. I think that's the preferred solution (even if it means the v3.0 will actually not be the v3, but if we release both at the same time, I guess no one will use the v3.0, everyone will jump straight to the v3.1)

We should definitely run the tests with the DB on the previous schema to enforce this, otherwise it's probably prone to wishful thinking.
I think it would also be interesting to start versioning our procedures, so anytime we change them, we can control easily what version of procrastinate uses what version of the procedures while still having multiple ones in the code. That might be much less 🤯 for us. Though, we can't do the same with all objects (tables, enums, triggers...), of course.

I sometimes wonder if it's worth using procedures at all, compared to doing everything in the queries, but that's a whole other debate.

@medihack
Copy link
Member

Sorry, I am still a bit confused 😞. So with #1186 we will do it the other way around than

there would always be a path where you upgrade code, then db, then code, then db

because

we'll do a v3.0 that does almost no change in the code (so almost identical to v2.x) but introduces DB changes

If the migration will be part of v3.0, wouldn't it be more congruent to check the latest v2.x version (straight before the release) and name it that way?

But maybe I take care of those other open issues for v3 and just watch and learn what you do with this issue here 😉.

We should definitely run the tests with the DB on the previous schema to enforce this, otherwise it's probably prone to wishful thinking. I think it would also be interesting to start versioning our procedures, so anytime we change them, we can control easily what version of procrastinate uses what version of the procedures while still having multiple ones in the code. That might be much less 🤯 for us. Though, we can't do the same with all objects (tables, enums, triggers...), of course.

I sometimes wonder if it's worth using procedures at all, compared to doing everything in the queries, but that's a whole other debate.

And all those triggers that use them?

@ewjoachim
Copy link
Member Author

and just watch and learn what you do with this issue here 😉.

Haha you seem to think I have it all thought through and I have a plan. I'm learning as I go, and what I've learnt so far is that if you say there's something strange, I better get looking, because you've shown an excellent gut feeling for those issues so far :)

And all those triggers that use them?

I believe the set of procedures used by triggers is quite distinct from the set of "business-logic" procedures, the former aren't very complex (save procrastinate_trigger_status_events_procedure_update) nor often updated. The ones that may benefit from not being procedures could be procrastinate_defer_job, procrastinate_fetch_job, procrastinate_finish_job and the likes.

@medihack
Copy link
Member

medihack commented Oct 3, 2024

It is confusing :( When we wrote procrastinate, we were in an environment where we couldn't stop the service while applying migrations, so we had to ensure we created compatible migrations, that there would always be a path where you upgrade code, then db, then code, then db etc, without service interruption. As of today this is still an official commitment procrastinate does. We can revisit this but I believe it's worth trying to keep it.

I'm curious how this is supposed to work with all the changes in the database schema. Can't we justify the service interruption with a major release and the breaking changes (I really regret introducing that aborting state 🙄)?
I also wonder why the migrations don't have the same version as their associated release. That way, deciding what to migrate to would be pretty straightforward (I am happy that we use the Django migrations in our apps), but I'm probably just missing something.

@ewjoachim
Copy link
Member Author

ewjoachim commented Oct 4, 2024

I'm curious how this is supposed to work with all the changes in the database schema.

Usually, it's about making the schema support 2 versions:

  • When you create a field, you create it one version before you start using it, and if you need it filled with an initial value, you use a migration and a trigger so that up until your code actually starts using it, it has the correct value. Alternatively, if the new field has a simple default value, you don't need a trigger, just create the field before the code starts using it. If the field is required, create it without a constraint and after the code has started using it, deal with potential wrong values created in the meantime and add the constraint.
  • When you delete a field, you delete it one version after you stop using it. Usually you don't need to add triggers because you don't care what the value will be at the time of deletion.
  • Many column changes can be expressed with a field addition and a field deletion, potentially with triggers synchronizing the value of multiple fields for the time in between.

This made sense for procrastinate at the time we did it partly because we didn't imagine there would be a lot of migrations, partly because it was the way we did things. I can see that it's really annoying to everyone now.

Can't we justify the service interruption with a major release and the breaking changes

The process was supposed to help users do blue-green deployments even on major releases. If it happens that we can't it means we designed the process badly 😅

(I really regret introducing that aborting state 🙄)

Don't be. Process shouldn't get in the way of improving the lib. This highlights that we need to change the process. But all this discussion is also about how we can change it without it being just annoying for users who can't afford breaking compatibility easily.

There will be (hopefully very few, realistically at least some) places where, upon reading that deployment needs to shut off the service, people will schedule it off-hours (evening or weekend). I know because I've been in those kind of situations. As much as I'd appreciate that this kind of practice doesn't exist, as a lib maintainer, I want to make sure that I don't have other people sacrifice their time because I didn't make the effort to make the lib blue-green deployable. It doesn't mean we have to do the impossible, but at least try and see if it's actually easy or not.

I also wonder why the migrations don't have the same version as their associated release.

In the beginning, they had. and then we got it wrong multiple times because when we did the PR, we didn't know if the next release would be a minor or an incremental.
We changed it to "name the migration like the current release" so at least we'd know what it is, but there's still the risk that we would release while there are open PRs and forget to rename the migrations.
I think the migrations need their own version number independent from the release.

@ewjoachim
Copy link
Member Author

I'm going to make a PR to try and fix this mess. It's long overdue.

@ewjoachim
Copy link
Member Author

Finally slayed by @medihack in #1227, many kudos !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants