Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: parity with Postgres for TPC-H cardinality estimations #127

Open
10 of 22 tasks
wangpatrick57 opened this issue Mar 22, 2024 · 0 comments
Open
10 of 22 tasks

Comments

@wangpatrick57
Copy link
Member

wangpatrick57 commented Mar 22, 2024

Notes

  • Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
    • This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
  • Experiments ran with scale factor 1.0, seed 15721
  • If no other PR is mentioned, then the query was run based on feat: caching optd stats, 12x speedup on TPC-H SF1 #132

Queries

@wangpatrick57 wangpatrick57 changed the title Tracking: parity with Postgres for cardinality estimation Tracking: parity with Postgres for TPC-H cardinality estimations Mar 24, 2024
wangpatrick57 added a commit that referenced this issue Mar 30, 2024
**Summary**: Using magic numbers from Postgres in various selectivity
edge cases.

**Demo**:

Different (unfortunately worse) q-error on TPC-H SF1. See #127 for
per-query details on how this PR affects q-error.

![Screenshot 2024-03-30 at 11 27
24](https://github.com/cmu-db/optd/assets/20631215/b0cce5d4-6ac8-4cd5-b0cf-48f86db14d26)


**Details**:
* Fixed the cardinality of Q10!
* `INVALID_SEL` is **no longer used** at all during cardtest. It is
still used during plannertest as some plannertests use the optd
optimizer instead of the datafusion logical optimizer. This can be
checked by replacing all instances of `INVALID_SEL` with a `panic!()`
and seeing that cardtest still runs.
* Using magic number from Postgres for `LIKE`.
* Using magic number from Postgres for equality with various complex
expressions.
* Using magic number from Postgres for range comparison with various
complex expressions.
* Replaced `INVALID_SEL` with `panic!()` and `unreachable!()` statements
in places where it makes sense.
This was referenced Mar 30, 2024
wangpatrick57 added a commit that referenced this issue Mar 31, 2024
**Summary**: Implemented join selectivity formulas for inner joins,
left/right outer joins, and cross joins. Also properly accounts for
filters in the join condition.

**Demo**:
We now match Postgres on our median Q-error. See #127 for more details
on what queries this PR affected.
![Screenshot 2024-03-31 at 13 13
48](https://github.com/cmu-db/optd/assets/20631215/fae590a6-8c55-4016-b924-c697a1c25070)

**Details**:
* We only consider equality checks on columns of different tables to be
"join on conditions".
* Join selectivity formulas are from [Rogov
2022](https://postgrespro.com/blog/pgsql/5969618).
* If there are multiple on conditions, we multiply their selectivities
together.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant