Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTE support to select in QueryBuilder #6621

Open
wants to merge 1 commit into
base: 4.3.x
Choose a base branch
from

Conversation

nio-dtp
Copy link

@nio-dtp nio-dtp commented Nov 21, 2024

Q A
Type feature

Fixes #5018.

Summary

This pull request introduces support for Common Table Expressions (CTEs) across various database platforms and updates the QueryBuilder to utilize this feature.

@morozov
Copy link
Member

morozov commented Nov 22, 2024

@nio-dtp, thanks for the PR.

As this is a new feature, please retarget against 4.3.x. We only accept bug fixes and upgrade path improvements in the 3.x series.

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a very basic unit test that would demonstrate the expected SQL and an integration test that would run the resulting query on the platforms that support CTE.

I'm curious to see how this will work with query parameters (both positional and named).

src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
@nio-dtp nio-dtp force-pushed the cte-support branch 2 times, most recently from 2936e39 to 686bf02 Compare November 23, 2024 07:58
@nio-dtp nio-dtp changed the base branch from 3.9.x to 4.3.x November 23, 2024 07:58
@nio-dtp nio-dtp force-pushed the cte-support branch 4 times, most recently from 79cf2fb to 1e545fc Compare November 23, 2024 20:35
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
@nio-dtp
Copy link
Author

nio-dtp commented Nov 23, 2024

@morozov Thanks for the review. I've added some tests and a supports method for the platforms.

I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7

@nio-dtp nio-dtp marked this pull request as ready for review November 23, 2024 21:05
@nio-dtp nio-dtp requested a review from morozov November 23, 2024 21:05
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved

private function hasCTEs(): bool
{
return 0 < count($this->with);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0 < count($this->with);
return $this->with !== [];

Copy link
Member

@morozov morozov Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Yoda-style condition looks odd but the !== [] looks like a PHP-ism.

Generally, in order to compare two arrays, one first needs to compare their lengths and, if they are equal, iterate both arrays until a non-equal element is found.

In this case, checking if count($array) === 0 looks like a more meaningful alternative because we don't really need to the full-fledged/generic comparison algorithm.

src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
tests/Functional/Query/QueryBuilderTest.php Outdated Show resolved Hide resolved
@derrabus
Copy link
Member

I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7

We try to avoid those supportsX() methods in our platform classes and this one in particular looks like we want to remove it in 5.0. I think, we should take the optimistic approach here and assume that every platform supports CTEs with the standard syntax. This would also make this feature available immediately for 3rd-party drivers.

If we did that, what would be the worst this that could happen? MySQL 5.7 users will get a syntax error if and only if they try to run a query with a CTE. I could live with that.

If we really want a nicer error message for MySQL 5.7 users, we could move the getSQLForCTEs() method to the abstract platform and for MySQL 5.7 we override that method and raise an exception.

Either way: Please remove the supportsCTEs() methods everywhere. We don't need it. 🙂

@sbuerk
Copy link
Contributor

sbuerk commented Nov 24, 2024

Thanks @nio-dtp for providing this PR. ❤️ Love to see that other has the same needs and staying on the DBAL way to implement this. Just let me add my five cents to this.

First, I would remove the platform support cte methods as @derrabus already mentioned and asked for. In the end, we would end up with multiple support methods if we want to take care beforehand of support questions, because even a lot of vendors supports CTE in general, they have dirty little differences. I'm here more for let the database report if it does not understand something and that it needs to be done from the application then. DBAL should here only provide a generic way to let WITH (CTE) syntax in general created and used.

The second point I want to put a light on is the current implementation of the with() method of this PR. It simply adds additional parts with each call, which in general is fully fine. But there is no way to reset it. Instead of introducing a resetWith() method I would tend more adopt logic of select, which means having a with() methd which resets the internal array and adds the first with as the first part and a addWith() to add additional parts.

Regarding the multi part aspect I would question a third point. A with part can depend on one or more other parts, which requires that the parts a part depends on must be defined before that part:

WITH
 baseCTE AS (SELECT id, title FROM tableName),
 secondCTE AS (SELECT id, title FROM baseCTE where id < 1000)
SELECT * from secondCTE

If secondCTE would have been added before the baseCTE which would lead to a database error reported by the database. Here we should decide, if we want to help users of the QueryBuilder or not. Personally, I would add a automatic sorting for depending parts which means we need the depending setting which further encourage for point 2 to use two methods with() and addWith() to allow defining depending CTE part names with the added part.

public function with(string $name, string|\Stringable|self $part): self
{
  $this->with = [];
  $this->with[] = [ /* adding first part */ ];
}
public function addWith(string $name, string|\Stringable|self $part, string ...$dependsOn): self
{
  $this->with[] = [
    /* adding additional part */
  ];
}

The current implemetation would already allow to add a recursive CTE part in the current form which itself is a two part UNION query, where the first part is the initial query and the second part the recursive part, for example as pseudo SQL:

WITH RECURSIVE
recursiveCTE AS (
    -- initial query
    SELECT id, parentId, title FROM table_a WHERE parentId = 0

  UNION

    -- recursive query
    SELECT b.id, b.parentId, b.title
    FROM table_b AS b
    WHERE b.parentId > 0 AND b.parentId = recursiveCTE.id 

)
SELECT * FROM recursiveCTE

With recursive CTE the whole thing gets a little more nifty, as different vendors requires or allow lazyness on different levels. For example:

  • Some vendors requires that WITH RECURSIVE must be set in case at least one recursive part is in the chain, others allows to omit that or does not really care. Having it set for a query with a recursive part does not harm any database (as far as I have investigated it yet).
  • For recursive parts, the order of the fields between the initial and recusive *must be the same order and count, and also of the same type (some vendors are more sensible to that for example Postgres). For that, it is possible to also define the fieldnames within the cte part definition, which helps to keep track of the order and naming:
WITH RECURSIVE
recursiveCTE(virtual_id, virtual_title, virtual_parentid) AS (
    -- initial query
    SELECT
      id       AS virtual_id,
      title    AS virtual_title,
      parentId AS virtual_parent_id
    FROM table_a WHERE parentId = 0

  UNION

    -- recursive query
    SELECT
      b.id        AS virtual_id,
      b.title     AS virtual_title,
      b.parentId  AS virtual_parentid
    FROM table_b AS b
    WHERE b.parentId > 0 AND b.parentId = recursiveCTE.virtual_id 

)
SELECT * FROM recursiveCTE

That is not fully possible with the current implementation. So we could consider

  • having additionel methods withRecursive() and addWithRecursive() which tracks that a recursive part is added to ensure the WITH RECURSIVE syntax is added
  • or adding a type enum as argument for the with() and addWith() method.

In any case, the with() and addWith() should get a additional (optional) fields array to define the fieldnames for the (recursive) CTE part.

From a internal handling perspective, I suggest to introduce a internal class for cte parts similar to the internal Join part class along with the options required (name, field, depends, isRecursive etc) which would allow to omit a dedicated flag and later iterate the array and add the RECURSIVE keywoard as soon as at least one part has it set. Or a additional container around the parts could be implemented, which internal tracks that when a recursive part gets added. I would not make that class public API. With such a collection/part implemenation a sorting could also be implemented (with a exception if cycling has been etablished) on query building.

To be honest, I did not investigated yet which DBAL supported platforms allows RECURSIVE cte's on top of normal CTE's and which not and this was still outstanding.

I started a custom implementation a couple of months ago [1], but delayed it due to time constraints and implemented it as a custom solution within the TYPO3 decorated QueryBuilder (prefixed) as a test-baloon [2] as we needed it for the release after implementing a first (quite advanced) usage [3]

To summerize this, this PR is a good start but in my eyes not finished yet (without wanting to blame it), because in general it is the same I started with before adding the additional stuff. To be honest, in my working state I had also the support methods but already considered to drop them again before making a pull-request out of it.

I propose that we clarify first what Doctrine DBAL wants to support and what not out of all these things and than where to continue. Either (if @nio-dtp) is open to adopt it and continue or if I should polish and finish my work (which is alrady some steps further) and adopt to the decisions made (dependency sorting or not, internal class usage or not, ...)

Suggested method(s):

/**
 * @param non-empty-string[] $fields
 */
public function with(
  string $identifier,
  string|\Stringable|QueryBuilder $part,
  array $fields = [],
  WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}

/**
 * @param non-empty-string[] $dependsOn
 * @param non-empty-string[] $fields
 */
public function addWith(
  string $identifier,
  string|\Stringable|QueryBuilder $part,
  array $dependsOn = [],
  array $fields = [],
  WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}

Developers implementing a recursive CTE needs to use a dedicated QueryBuilder instance to build a union query using the union support to define it, if not done the database should report this as an issue and not tried to scan or throw a custom exception from doctrine.

Recursive CTE's have been the reason why I started and contributed the union support for the QueryBuilder as a preparation for providing a CTE implementation.

In my POC/WIP i extendted the src/SQL/Builder/DefaultSelectSQLBuilder.php to add the WITH SQL building support. That could also be considered if that is the better way or by doing it as additionel PRE rendering like in this change. I know my personal prefernce on that, but should also be considered decided here before going on with any change.

I just would not merge this one here to quickly before considering the aforementioned points, at least for a overall strategy and either making it directly or with followups.

@@ -342,7 +350,7 @@ public function getSQL(): string
QueryType::INSERT => $this->getSQLForInsert(),
QueryType::DELETE => $this->getSQLForDelete(),
QueryType::UPDATE => $this->getSQLForUpdate(),
QueryType::SELECT => $this->getSQLForSelect(),
QueryType::SELECT => $this->getSQLForCTEs() . $this->getSQLForSelect(),
Copy link
Member

@morozov morozov Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the CTE query builders have parameters? It looks like the parameter needs to be declared in the CTE builder but bound to the top-level one. This is non-obvious and potentially unusable. If the CTE builder contains bound parameters, they will be silently ignored.

As an end user, I'd expect that the CTE builder defines both the query and parameters, and they are taken into account by the top-level one. This way, the builders are naturally composable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, parameters have to be defined in the top QueryBuilder. In case of CTE that one building the WITH chain (in most cases). In case, it would be used as a sub-query it needs to be done on the main query using the CTE as sub-query.

That is already the case for using multiple QueryBuilder instances to create queries with sub-selects and also using the union support with sub QueryBuilder instances.

It would be the same requrirement here, and I would expect that it works here the very same as all the other parts. Developers have to take care to define parameters on the correct instance in all these cases.

@nio-dtp
Copy link
Author

nio-dtp commented Nov 25, 2024

Thanks you to all the reviewers and their detailed feedback, I'll rework my PR and make a more complete proposal.
This is my first contribution and I'm interested in achieve it.
I think I'll rework the code this way:

  • an array that will contains With Objects. Maybe I will first implement non recursive version of the CTE, but keeping it in mind for a future enhancement.
  • I think that the end user shall declare theirs CTEs in correct order (like he does in sql) and I guess this is not necessary maintaining a property that checks if a CTE is dependent on others. But maybe I've missed something ?
  • I'd like to separate the generation of the WithQuery and the SelectQuery as later we could more easily implement the WithQuery for an update or a delete request.
  • Also, as a general request, I'll not add the supportsCTE method 😆.

@morozov
Copy link
Member

morozov commented Nov 26, 2024

Suggested method(s):

I'd improve the following aspects:

  1. Naming. It's unclear from the names what the difference between with() and addWith() is. At least PHPDoc should clearly explain that.
  2. Too many optional parameters. Out of 5 parameters in addWith(), 3 are optional. It means that a single method can be used for too many different purposes.
  3. Do not expose enums like WithType::SIMPLE (especially, optional) to the end user. Make the intended behavior clear from the method name (e.g., addWithRecursive() is much clearer).

As for the "reset" method – what is the use case for it? It would basically allow to disregard the logic of a pre-built query. If it wasn't necessary, why declare it in the first place?

@sbuerk
Copy link
Contributor

sbuerk commented Nov 26, 2024

I talked with @derrabus on sunday about this and came up with following using 4 methods:

/**
 * @param string[] $fields
 */
public function with(
  string $name,
  string|\Stringable|self $part,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function addWith(
  string $name,
  string|\Stringable|self $part,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function withRecursive(
  string $name,
  string|\Stringable|self $initialOrUnionPart,
  string|\Stringable|self|null $recursivePart,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function addRecursive(
  string $name,
  string|\Stringable|self $initialOrUnionpart,
  string|\Stringable|self|null $recursivePart,
  array $fields = [],
): self {}

If a UNION query is passed as the first requried part for the recursive vairants, it is simply used. If both parts are passed, internally a union query should be created (using the union querybuilder api without allowing duplicates). Should be explained within the method phpdocblocks and by providing a concrete single part the developer has full power about the recursive union block.

to follow semantic and logic similar to select() / addSelect() or order() / addOrder().

with or withRecursive will reset the internal array and create a first element, whereas the addWith* methods adds an additionall entry to the internal array (DTO object).

The *Recursive() method should set a flag within the DTO object that recursive is needed.

We think that a dedicated resetWith() method is not needed as it does not make sense to transform a created CTE query to something else again at a later point, in contrast to a normal select query to create a count query out of it (and reseting group/order things etc).

It should be possible to define the fields for the CTE part, but having that optional:

WITH
 customCte(virtual_id, virtual_field)
   AS (SELECT id AS virtual_id, somefield AS virtual_field FROM sometable)
SELECT * from customCte

Regarding my point for the `depends, in special for the recursive CTE's we discussed and decided to not provide an API in this low-level implementation. Developers or frameworks (for example ORM or similar) should keep track on there own and add the parts in the required and correect order. No custom sorting or cylcing detection, will be reported by the database when executed.

We had no hard meaning for the internal implementation and building the query, so the current switch form may be okay. Personally, I think it would make more sense to move that into the DefaultSelectQueryBuilder and pass the with array (of DTO's) within the SelectQuery DTO. Not sure if that should count as breaking though or not.

with/addWith and withRecursive/addWithRecursive naming would follow the semantic Doctrine already has for the other parts, exceopt that the *Recursive variant makes it more clear. Sure, PHPDoc block needs to make that clear. In the end, it kind of follows the design approach of the QueryBuilder.

And it is noticable that the top (most outer) querybuilder instance (in most cases that instance where the with/addWith/withRecursive/addWithRecursive methods are used) needs to be used for creating named placeholders (parameters). That matches the same requriement and flow as it is needed for using QueryBuilder to create sub queries or for the union support already and can therefore be taken as expectable. (Remark to #6621 (comment))

Taking this, we could make this one a first implementation for the with/addWith part and adding the recursive support later on - or doing both in one go.

My question here is, if @nio-dtp wants to update and work on this and has the time for it the next time. Otherwise I would rebase and finish my work in my fork and provide an additional PR in the next two weeks (which was a start after an original pitch and discussion with @derrabus but not added as PR(draft pr) yet due to time constraints).

I'm totally fine not to do anything and test/verify/review this later on but also being fine finishing mine (because of access) and mention @nio-dtp as co-author then.

@nio-dtp
Copy link
Author

nio-dtp commented Nov 26, 2024

If it is ok for all, I'll make a proposal by the end of the week without the recursive part.

@derrabus
Copy link
Member

I talked with @derrabus on sunday about this and came up with following using 4 methods:

A small remark on this: I would remove the Stringable from those signatures. We don't allow stringables anywhere else on the query builder and I don't think we should start allowing them here.

@sbuerk
Copy link
Contributor

sbuerk commented Nov 27, 2024

If it is ok for all, I'll make a proposal by the end of the week without the recursive part.

Thanks @nio-dtp, and that is pretty fine. I will add the recursive part afterwards in a follow up PR.

@nio-dtp nio-dtp force-pushed the cte-support branch 3 times, most recently from 8c8b185 to 06d6612 Compare November 28, 2024 21:59
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
src/Platforms/AbstractPlatform.php Outdated Show resolved Hide resolved
tests/Functional/Query/QueryBuilderTest.php Outdated Show resolved Hide resolved
@nio-dtp nio-dtp marked this pull request as ready for review November 29, 2024 14:03
public function addWith(string $name, string|QueryBuilder $part, array $fields = []): self
{
if (count($this->withParts) === 0) {
throw new QueryException('No initial WITH part set, use with() to set one first.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an exception? Will anything break if this exception is not thrown?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it will break nothing. We can accept it and a with part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have only one method with. I don't see the case when we need to reset the with parts.

@@ -1266,7 +1323,14 @@ private function getSQLForSelect(): string
throw new QueryException('No SELECT expressions given. Please use select() or addSelect().');
}

return $this->connection->getDatabasePlatform()
$selectSQL = '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use an array of query parts and implode them at the end. Otherwise, with every concatenation, the previously built string string will be copied to the new one.

return $this->connection->getDatabasePlatform()
$selectSQL = '';
if (count($this->withParts) > 0) {
$selectSQL .= $this->connection->getDatabasePlatform()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the result of $this->connection->getDatabasePlatform() be assigned to a variable and then reused?


namespace Doctrine\DBAL\Query;

final class WithQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this class? How is accepting a WithQuery different from accepting an array of With?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now there is no difference, I will change the parameter type of the buildSQL method

*
* @return $this
*/
public function with(string $name, string|QueryBuilder $part, array $fields = []): self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the meaning of $name and the usage of $fields in PHPDoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be "columns", not "fields". We're not dealing with abstract data structures with fields. We're dealing with relational tables and columns.


use Doctrine\DBAL\Query\WithQuery;

interface WithSQLBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an interface? Could we make it a class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for now we do not need other implementation of this interface

}

/** @param string[] $fields */
private static function fields(array $fields): string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a separate method and why static?

Copy link
Author

@nio-dtp nio-dtp Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purpose was to make the array_map more readable. And static because it has not to be aware of the context of the class and the callback of the array_map is static too.
I can refactor this if this not acceptable.

$expectedRows = $this->prepareExpectedRows([['id' => 1]]);
$qb = $this->connection->createQueryBuilder();

$cteQueryBuilder1 = $this->connection->createQueryBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why $cteQueryBuilder1 if there's no$cteQueryBuilder2?

$cteQueryBuilder1 = $this->connection->createQueryBuilder();
$cteQueryBuilder1->select('id')
->from('for_update')
->where($qb->expr()->eq('id', $qb->createNamedParameter(1, ParameterType::INTEGER)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the user still needs to bind parameters defined in the CTE builder to the top-level one. I don't think this behavior is acceptable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will produce another test with binding parameters as the top level.

@nio-dtp
Copy link
Author

nio-dtp commented Nov 30, 2024

SQL Server does not support ORDER in CTE neither using columns to fetch into declaration.

WITH cte_a(virtual_id) AS (SELECT id AS virtual_id FROM table_a ORDER BY id ASC) SELECT * FROM cte_a

Should we manage the error or assume that the developer should know that this is not supported ?

@nio-dtp nio-dtp force-pushed the cte-support branch 5 times, most recently from 42cd7b4 to 3a04999 Compare November 30, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants