Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockroachDB extremely long query due to Array(UUID()) #1005

Open
Jacky56 opened this issue Jun 2, 2024 · 5 comments
Open

cockroachDB extremely long query due to Array(UUID()) #1005

Jacky56 opened this issue Jun 2, 2024 · 5 comments

Comments

@Jacky56
Copy link

Jacky56 commented Jun 2, 2024

hello all,

Assume we have a simple Post table:

from piccolo.columns import *

class Post(Table):
    uuid = UUID(primary_key=True)
    user = ForeignKey(User, null=False)
    description = Text()
    users_mentioned = Array(UUID())

When I perform a Post.objects().run_sync() or Post.select(Post.all_columns()).run_sync() the query is extremely slow.

But when I leave out querying users_mentioned: Array(UUID()) such as Post.select(Post.uuid).run_sync() the query speeds are as expected (empty table).

From looking at my cluster, it performs a very complex query (10s+) each time I request for the Array(UUID()) datatype:

WITH RECURSIVE typeinfo_tree (oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids,
attrnames, depth) AS (SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim,
ti.range_subtype, ti.attrtypoids, ti.attrnames, _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti
   WHERE ti.oid = ANY ($1::OID[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype,
ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti, typeinfo_tree AS tt
   WHERE ((((tt.elemtype IS NOT NULL)
    AND (ti.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL)
    AND (ti.oid = ANY (tt.attrtypoids)))) OR ((tt.range_subtype IS NOT NULL)
    AND (ti.oid = tt.range_subtype))) OR ((tt.basetype IS NOT NULL)
    AND (ti.oid = tt.basetype))) SELECT DISTINCT *, basetype::REGTYPE::STRING AS basetype_name, elemtype::REGTYPE::STRING AS elemtype_name,
range_subtype::REGTYPE::STRING AS range_subtype_name
 FROM typeinfo_tree
 ORDER BY depth DESC

Performing a query without piccolo such as:

import psycopg2

connection = psycopg2.connect(**con)
cursor = connection.cursor()
cursor.execute("select * from post")

works fine/speed as intended.

please send help if I am doing anything wrong

@Jacky56
Copy link
Author

Jacky56 commented Jun 2, 2024

Another interesting note:

  • swapping Array(UUID()) to Array(Varchar()) would still run the complex query
  • swapping Array(?) to JSONB would still run the complex query

@sinisaos
Copy link
Member

sinisaos commented Jun 3, 2024

@Jacky56 You are doing everything right on the table definition and querying Piccolo. I think the problem is how asynpg works with CockroachDB. There have been several issues online based on this. I have Cockroach v22 on my local machine and I can't even insert data with arrays because the error is unable to decorate subquery (which was resolved in v23). Then I upgrade to the latest version available (v24.1.0) and I get the same as you, a very slow query with a complex query in the cluster log. I don't think we can do anything in Piccolo, Based on this discussion the type introspection is done once and I can confirm that if we use a connection pool only the first query is slow and every other query is fine. psycopg doesn't have those problems but Piccolo is based on asyncpg

@dantownsend
Copy link
Member

dantownsend commented Jun 3, 2024

@sinisaos Thanks for looking into this. Hopefully asyncpg will solve this problem in a future version. It's a good reason for us to consider supporting multiple Postgres client libraries. I'm going to try upgrading our CI to use Cockroach v24.

@Jacky56
Copy link
Author

Jacky56 commented Dec 5, 2024

hello @sinisaos @dantownsend

Is there a way to change psycopg to another sql driver or prerun introspection queries on boot?

im running piccolo on a container and cold starting the service takes criminially long because of what sinisa pointed out

@sinisaos
Copy link
Member

sinisaos commented Dec 5, 2024

@Jacky56 Psycopg has different connection, pooling and cursor handling than asyncpg and building a Psycopg engine for Piccolo (we can't easily subclass PostgresEngine) a would not be an easy task. You can try the PSQLPy engine for Piccolo which is in early development and not all columns are covered.
I tried a combination of PSQLPy and Cockroach and it works. You can try something like this.

import uuid
from piccolo.table import Table
from piccolo.columns import Array, Text, ForeignKey, UUID, Varchar
from psqlpy_piccolo import PSQLPyEngine


DB = PSQLPyEngine(
    config={
        "host": "localhost",
        "database": "piccolo",
        "user": "root",
        "password": "",
        "port": 26257,
    }
)


def generate_uuid():
    return str(uuid.uuid4())


class BaseUser(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    username = Varchar()


class Post(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    post_user = ForeignKey(BaseUser, null=False)
    description = Text()
    users_mentioned = Array(base_column=UUID(default=generate_uuid))


if __name__ == "__main__":
    results = Post.objects().run_sync()
    print(results)
    related_results = Post.select(Post.all_columns()).run_sync()
    print(related_results)

Execution is instant. I hope that makes sense and you can use it in your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants