Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate key value on many-to-many relationship versioning #178

Open
okcomp opened this issue Jan 12, 2018 · 19 comments
Open

Duplicate key value on many-to-many relationship versioning #178

okcomp opened this issue Jan 12, 2018 · 19 comments

Comments

@okcomp
Copy link

okcomp commented Jan 12, 2018

I have a many-to-many relationship, here's the models:

class Membership(BaseModel):
    tenant_id = Column(Integer, ForeignKey('tenant.tenant_id', ondelete='CASCADE'), primary_key=True)
    user_id = Column(Integer, ForeignKey('user.user_id', ondelete='CASCADE'), primary_key=True)

class User(BaseModel):
    user_id = Column(Integer, primary_key=True)

class Tenant(BaseModel):
    tenant_id = Column(Integer, primary_key=True)

When I tried to create the association model, I got psycopg2.IntegrityError error on the versioned membership table.

>> m = Membership(tenant_id=tenant_id, user_id=user_id)
>> db.session.add(m)
>> db.commit()
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
psycopg2.IntegrityError: duplicate key value violates unique constraint "version_membership_pkey"
DETAIL:  Key (tenant_id, user_id, transaction_id)=(1, 1, 2) already exists.

However, it seems to work fine if I use the association table instead of the association object to create this relationship:

membership = db.Table('membership',
                      Column('tenant_id', Integer, ForeignKey('tenant.tenant_id', ondelete='CASCADE'), primary_key=True),
                      Column('user_id', Integer, ForeignKey('user.user_id', ondelete='CASCADE'), primary_key=True))

But I do need association object in my case to support extra columns. Does Continuum actually support versioning use of association object for many-to-many relationship? Or is there any configuration I'm missing here to get this work?

@cofinley
Copy link

Bump. I have an association table and having problems with versioning. the article.version[x].tags is the same for all versions. I'm guessing because of the default validity strategy and the end_transaction_id is not changing from null. I'd like to know more about what can be done with association tables and versioning that kind of relationship.

@cofinley
Copy link

Update: ended up dumping the tag relationship (comma separated) to a string column for the Article model. I stopped trying to get versioning to work within the association table. Now I can just parse and dump the relationship to compare versions.

@ErinMorelli
Copy link

Bumping this issue (@kvesteri) - I'm seeing this same problem with many-to-many relationships.

sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) duplicate key value violates unique constraint "media_tags_version_pkey"
DETAIL:  Key (media_id, tag_id, transaction_id)=(417, 105, 13) already exists.
 [SQL: 'INSERT INTO media_tags_version (media_id, tag_id, transaction_id, operation_type) VALUES (%(media_id)s, %(tag_id)s, %(transaction_id)s, %(operation_type)s)'] [parameters: {'operation_type': 2, 'media_id': 417, 'tag_id': 105, 'transaction_id': 13L}] (Background on this error at: http://sqlalche.me/e/gkpj)

It seems like it's potentially issuing two INSERT statements for the same transaction?

I'm using the basic association table setup and the native_versioning option. Could it be an issue with the triggers? Would love some help debugging. Thanks!

@ErinMorelli
Copy link

Update:

I ran the sqlalchemy_continuum.dialects.postgresql.drop_trigger function on all of my tables and now things are working, but I'm assuming that means I am no longer using the native_versioning feature, correct?

If I run sqlalchemy_continuum.dialects.postgresql.sync_trigger on all of the tables, the issue returns, so this is clearly where the issues is. Any ideas on a work-around?

@Simes
Copy link

Simes commented May 30, 2019

I'm also having this problem. Does anyone know of a workaround until the issue is fixed? As it's been open for over a year.

@xg1990
Copy link

xg1990 commented Jul 16, 2020

the issue is still there.....

@Simes
Copy link

Simes commented Jul 16, 2020

I didn't put the kludge I come up with into this thread originally, because it does seem to be a horrible kludge. But, as this issue is still ongoing, maybe it'll be helpful to someone. It removes the duplicate statements by the hacky method of putting them into a dictionary and using the stringified version of the statement as the key.

Add the replacement UnitOfWork class to your codebase somewhere:

class UnitOfWork(sqlalchemy_continuum.UnitOfWork):
    """ We replace the function we need to patch, otherwise the superclass still does everything
    """

    def create_association_versions(self, session):
        """
        Creates association table version records for given session.
        :param session: SQLAlchemy session object

        """
        # statements = copy(self.pending_statements)

        # Dedupe statements
        temp = { str(st) : st for st in self.pending_statements }

        statements = temp.values()

        for stmt in statements:
            stmt = stmt.values(
                **{
                    self.manager.options['transaction_column_name']:
                    self.current_transaction.id
                }
            )
            session.execute(stmt)
        self.pending_statements = []

Also, right after your call to make_versioned, insert the code to replace the supplied UnitOfWork class with the patched one:

sqlalchemy_continuum.versioning_manager.uow_class = horrible_kludges.UnitOfWork

I am 100% certain that a better solution than this exists and I am by no means advocating for adding this to Continuum. But it does get rid of the issue, at the cost of additional execution time.

@maarten-dp
Copy link

maarten-dp commented Jul 17, 2020

I've set up a session listener to create a new transaction when needed. A bit of a workaround and somewhat inefficient, but it seems to work on my end.
Any feedback or improvements are welcome of course.

from sqlalchemy_continuum import versioning_manager
from sqlalchemy_continuum.version import VersionClassBase
from sqlalchemy import event
from sqlalchemy.orm import Session

@event.listens_for(Session, 'before_flush')
def transaction(session, flush_context, instances):
    for target in session.new.union(session.dirty):
        if isinstance(target, VersionClassBase):
            uow = versioning_manager.unit_of_work(session)
            transaction_id = uow.current_transaction.id
            res = session.query(target.__class__).filter_by(transaction_id=transaction_id).all()
            if res:
                # A transaction already exists, so create a new one.
                Transaction = uow.manager.transaction_cls
                uow.current_transaction = Transaction()
                target.transaction = uow.current_transaction
                target.transaction_id = None

@killthekitten
Copy link
Contributor

@Simes @maarten-dp thanks both of you for proposing the workarounds! Could you also post the migrations that create the version tables that break the transaction? I believe the version tables generated for many-to-many associations are lacking a correct primary key.

Let me give a couple examples. This is a many-to-many versions table:

    op.create_table(
        "foo_bar_versions",
        ...,
        sa.PrimaryKeyConstraint(
            "transaction_id", name=op.f("pk_foo_bar_versions")
        ),
    )

Here's a "normal" one:

op.create_table(
        "bar_baz_versions",
       ...,
        sa.PrimaryKeyConstraint(
            "id", "transaction_id", name=op.f("pk_bar_baz_versions")
        ),
    )

Primary key cannot be duplicated, so it fails when two or more items are added within the same transaction, as the transaction_id is the primary key. As the simplest possible workaround I can see specifying a primary key on the association table.

However, it's not clear how SQLAlchemy-Continuum should deal with this. Would it help if it raised an error earlier, when building the versions table? Or should it create a primary key column with an increment?

@killthekitten
Copy link
Contributor

Also, it would help if @okcomp could share the code that adds items to the transaction. It looks like their original code should raise an error for both table-based and object-based definitions.

@maarten-dp
Copy link

maarten-dp commented Jul 17, 2020

Hi @killthekitten,

I indeed have no id on many-to-many tables, nor are my foreign keys flagged as primary keys in the version table, while they are primary keys in the original table.

I suppose it makes sense, but creates the issue of using the same transaction id as you explained.

My workaround is merely checking if records with this transaction id already exist while flushing/committing, and create a new transaction if there are. Drawback to this is that non-many-to-many tables are also impacted by this, and might generate unneeded transactions (I might add some logic that only does the above if transaction_id is the only PK).

I feel like adding a dedicated primary key in the association table version distorts the continuity of the version table with the table it's versioning, as well as the continuity with the other version tables. I think a better approach would be to recycle the primary keys of the original table. But I assume there's a reason for not having done this already?

@killthekitten
Copy link
Contributor

killthekitten commented Jul 17, 2020

@maarten-dp how do you generate your migrations? Do you have alembic set up? It won't automatically detect primary key changes in the original table, unless they were added when the table was first created. SQLAlchemy-Continuum won't do that either, so it requires some manual work.

In other words: it should all work nicely as long as the primary keys were there from the beginning.

@maarten-dp
Copy link

@killthekitten Yes, the migrations are done with alembic and the original tables started out with the correct primary keys.

If I understand correctly, you're telling me SQLAlchemy-Continuum is not able to transfer the original primary keys to the versioning table, and manually adding these primary keys to the versioning table would solve the problem?

If so, is it possible to have this behavior inherently in Continuum?

@killthekitten
Copy link
Contributor

@maarten-dp my point is it's likely an alembic issue. Are you sure the primary keys were there from the beginning? Could you maybe share the original migration and the association table definition? Other info like SQLAlchemy / alembic versions might help too.

In my case this code generates a correct migration, but only when the migration is creating the table:

join_table = Table(
    "questionnaires_uploads",
    DeclarativeBase.metadata,
    Column(
        "questionnaire_id",
        UUID(as_uuid=True),
        ForeignKey("questionnaires.id"),
        nullable=False,
        primary_key=True,
    ),
    Column(
        "upload_id",
        UUID(as_uuid=True),
        ForeignKey("uploads.id"),
        nullable=False,
        primary_key=True,
    ),
)

Could it be that alembic can't alter primary keys?

@maarten-dp
Copy link

maarten-dp commented Jul 17, 2020

I just had another look, and I understand better now.

The versioning table was indeed added by an alembic migration.

The Version model created by Continuum correctly flags the primary keys, but as you said, alembic doesn't pick them up when generating the migration script.

So in short, this issue is indeed generated by alembic (at least, on my end). Thanks for your help @killthekitten

edit: as requested, the migration code in question:

    op.create_table(
        'model1_model2_version',
        sa.Column('model1_id', sa.Integer(), nullable=False),
        sa.Column('model2_id', sa.Integer(), nullable=False),
        sa.Column(
            'transaction_id',
            sa.BigInteger(),
            autoincrement=False,
            nullable=False,
        ), sa.Column('end_transaction_id', sa.BigInteger(), nullable=True),
        sa.Column('operation_type', sa.SmallInteger(), nullable=False),
        sa.PrimaryKeyConstraint('transaction_id'))

I guess I'm using an out or date version of alembic :)

@maarten-dp
Copy link

maarten-dp commented Jul 17, 2020

I've made a small alembic migration script that fixes this issue, for anyone in need

from alembic import op
import sqlalchemy as sa
import sqlalchemy_utils

from sqlalchemy_continuum.version import VersionClassBase
from my_package import db
import inspect


def alter_version_model(klass, conn, metadata):
    table_name = klass.__table__.name
    pk_cols = [c.name for c in klass.__table__.primary_key.columns]

    existing_table = sa.Table(table_name, metadata, autoload_with=conn)
    existing_pk_cols = [c.name for c in existing_table.primary_key.columns]
    pkey = existing_table.primary_key.name

    if sorted(pk_cols) == sorted(existing_pk_cols):
        print('Nothing to do for {}'.format(table_name))
        return

    msg = 'Updating the PK of {} from {} to {}'
    print(msg.format(table_name, existing_pk_cols, pk_cols))

    op.drop_constraint(pkey, table_name)
    op.create_primary_key(pkey, table_name, pk_cols)


def upgrade():
    conn = op.get_bind()
    metadata = sa.schema.MetaData()

    for klass in db.Model._decl_class_registry.values():
        if inspect.isclass(klass) and issubclass(klass, VersionClassBase):
            alter_version_model(klass, conn, metadata)


def downgrade():
    pass

I also confirm that running this migration script fixes my issues.

@pjamessteven
Copy link

Upgrading Alembic to the latest version (1.7.4) solved this problem for me

@alexfromapex
Copy link

This wasted a lot of my time so hopefully this helps someone:

When you generate migrations with Alembic, on your association table in the auto-generated migration code, you probably have something like this:

sa.PrimaryKeyConstraint('transaction_id')

And you just need to add the other primary key or foreign key columns, e.g.:

sa.PrimaryKeyConstraint('column1','column2','transaction_id')

And then downgrade and upgrade the migration again (not sure if this is needed)

@marksteward
Copy link
Collaborator

Is this specific to Continuum in some way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants