Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST-based SQLite driver #1

Merged
merged 125 commits into from
Feb 12, 2025
Merged

AST-based SQLite driver #1

merged 125 commits into from
Feb 12, 2025

Conversation

JanJakes
Copy link
Contributor

Continuing the work from WordPress/sqlite-database-integration#164.

Additionally, move the current SQLite driver prototype to a temporary class
WP_SQLite_Driver_Prototype to be gradually moved to the new driver.
@JanJakes
Copy link
Contributor Author

JanJakes commented Feb 7, 2025

@adamziel Thanks for the review! I think I addressed most of the issues and created tickets for the remaining ones. Please, let me know if I missed anything.

@JanJakes JanJakes requested a review from adamziel February 7, 2025 15:54

/*
* @TODO: Let's implement a more powerful AST-querying API.
* See: https://github.com/WordPress/sqlite-database-integration/pull/164#discussion_r1855230501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're refactoring the API, let's annotate each method we expect to change as @private or @internal and add a comment for the API consumers explaining the reasons not to use them yet. Otherwise, we'll introduce breaking changes without a prior notice. Maybe we could @deprecate them right away – that way the IDEs will provide visual hints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Maybe, while we're behind the feature flag and have a bunch of TODOs, we could still consider all the new APIs to be experimental and subject to change? Given we're under the Automattic org repo, maybe that would be enough? Or do you think it's safer to annotate all public methods for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a separate ticket, we's just have to resolve it before handing the api over for consumption by another team

}

if (
WP_MySQL_Lexer::ROLLBACK_SYMBOL === $token1->id
Copy link
Contributor

@adamziel adamziel Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bale out if we see ROLLBACK TO SAVEPOINT until we can support that? Same for creating savepoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do support them internally so maybe baling out would be as easy as preserving the savepoint semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket: #14

$alias_map[ $alias ] = $ref;
}

// 3. Compose the SELECT query to fetch ROWIDs to delete.
Copy link
Contributor

@adamziel adamziel Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we'll also see HAVING, GROUP BY, LEFT JOIN etc.? It's fine if we do, I'd just bale out if we see any unsupported criterion.

More broadly, it would be fantastic to have a catalog of things we do support (a whitelist/allowlist) and compare the incoming query against it rather than assume we know all the relevant parts of the incoming query and we can always cherry-pick the what's relevant. It could be as simple as a foreach loop that must make a bale/ignore/preserve/transform decision about every single node it sees. Thinking out loud, it could have a switch statement with a catch-all default case that bales out on anything we don't expect to see.

The mental model would be a version or "innocent until proven guilty" – "all queries error out unless they're explicitly supported".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we'll also see HAVING, GROUP BY, LEFT JOIN etc.? It's fine if we do, I'd just bale out if we see any unsupported criterion.

Luckily, none of these is possible in a MySQL DELETE statement. That said, we don't have any specific handling of DELETE with LIMIT and SQLite seems to support it only with a specific compile flag.

More broadly, it would be fantastic to have a catalog of things we do support (a whitelist/allowlist)...

Do you mean the DELETE statement in particular, or more like a general rule? I'd like to be more restrictive in general and steer the driver more in the allowlist direction overall, but it can get pretty complicated — a foreach loop may not work very well, since nodes can be deeply nested, have different meanings in different context, order may change the meaning, etc.

That said, I do think we could steer it a bit more towards the "allowlist" approach overall, but it may be a different approach on a case-by-case basis, and maybe it's not absolutely necessary for all cases (e.g., SELECT and other read statements). Most importantly, all DDL statements should be allowlisted, so that we know what we support on the information schema side and so that the information schema data cannot become incorrect.

Considering how large this PR is and that at this stage it's still behind a feature flag, what do you think if I create tickets for these points and describe all of this? I'd like to get this initial PR done so that we can then work in smaller, more focused chunks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, separate issues seem fine for this. And sure, if a strict allowlist isn't an option then let's examine realistic options that move us in the right direction. Thank you for all the great work here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a ticket: #15

return;
}

// @TODO: Translate DELETE with JOIN to use a subquery.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making an explicit decision about every AST node would a) make it apparent to the code reader b) prevent running a DELETE query with a condition different from what the developer supplied (and potentially leading to data loss)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is discussed in the comment above — I agree that we should go more in the "allowlist" direction, at least for DDL and write statements, but the particular implementation may differ on a case-by-case basis, and I'd create a separate ticket for that, if that makes sense to you.

@adamziel adamziel merged commit 5dfd5fa into develop Feb 12, 2025
8 checks passed
@adamziel
Copy link
Contributor

Great work @JanJakes! There are specific directions to go from here, but regardless - this is such a huge improvement ❤️

@JanJakes JanJakes deleted the ast-sqlite-driver branch February 12, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants