-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore binary offsets of PDOStatement parameters #5897
base: 3.9.x
Are you sure you want to change the base?
Conversation
Please always add a functional test that reproduces the problem that you want to address. |
Hello again. In order to consider your patch, I need a green CI (including PHPCS and static analysis) and a functional test that covers your change. |
Not sure if you're reading any of my comments, but there's a tool called |
I just saw your comment on what you needed and started with coding standards. Did I miss some inline comments? Up until now I've been testing integrated into another project (changing the file in the vendor directory), not directly in this one. I figured I'd need a couple of pushes, but the errors have come in piecemeal. I think it is clean now from a code standards perspective. I am not even close to being a PHP programmer (>90% of the PHP code I generally use is generated, and over half of the rest is JSON-in-php, so I'm only writing a few non-declarative lines for each project and am only in PHP at all to use Doctrine, which was chosen because a previous client wanted me to plug into a PHP project). However, I'm several decades removed from being a newbie programmer (first production code was with Microsoft in 1989) and have also been a primary code reviewer on multiple projects for decades. I have to say some of the things in this set of standards seem arbitrary. Spaces after a ! operator? Enforcing early exit (I find code with multiple return statements much harder to follow than a single-indent if block, and asking for early exit to avoid an if block with 3 lines in it is absurd)? No non-boolean falsy/truthy statements (a core part of the language instead of trying to emulate C# language rules without compiler support)? Aligned assignment operators (this just makes it hard to search for assignment in your codebase, and routinely increases the size of git diffs)? Seriously, I get some of the spacing checks and type checks (just took some work for me to figure out where to put all of the declarations with some in code and some in annotations), but most of the rest of this is frankly subjective. It is obviously easier with a tool to do it for you, but that doesn't make it less arbitrary. Anyway, it's not my codebase, so do as you will. So as not to blast your CI server more I can try to run out of the codebase. I'm not really seeing how I'm supposed to set all of this up (I'm assuming you're running against real databases at some point instead of mocking the backend), or any guidelines on where you want functional tests. Do I need a test for each PDO driver type or can I cover PDO once for any driver? Do you want to replicate this for the other drivers (which I'm pretty sure will fail and need the same treatment)? Any tips or document pointers would be useful. |
No, it's just that I never got a response from you and I've been asking for a test for a very long time now.
Very well, but that shouldn't be your concern. As I said, there's a tool that fixes the code to match the configured standard. Just run it after you've changed the code.
You don't have to. We run the same test suite against different databases. Just run it against your favorite one. By default, the test suite is run against SQLite which does not require any additional setup.
Have a look at the existing test suite. They should give you good examples of how functional tests look like.
Usually you don't. Most of our functional tests are run with all drivers. |
OK, I added the functional test, which covers bindValue on a resource and is passing with this fix (and fails without it). This fix is currently not implemented for other drivers or for bindParam. The bindParam test (testBindParamProcessesStream) does not actually pass a resource (the form of fopen returns a string, not a resource). However, even if I passed the resettable resource into the bindParam function the bound variable was a string after the call, so could not be reset. I do not know if this is intentional or not or if it is consistent across other drivers, but it definitely complicates resetting the stream. bindParam was no actually used by the original doctrine scenario and is deprecated as far as I can tell. This could also be implemented for bindParam, but would need to change somewhat due to the use of reference variables. The resources array would need to be iterated immediately before the call to execute, with the variable type (is a resource) verified and stream positions recorded at this time. We could also consider doing this for the bindValue case to eliminate the possible modification of the resource position between the bindValue and execute calls. This would also eliminate one of the global variables. I think this is all that you asked me to do. Let's see what CI says. |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
@mcurland Do you intend to continue your work? Do you need any more input from me? The CI is still red. |
I'm seeing one static CLI issue now at: As far as I know CI is passing for the original issue flagged here, which is with the PDO driver along with the 3 other similar drivers (PDO, IBMDB2, MySQL and PgSQL). However, the process seems to be aborting testing later drivers if an earlier one fails so I'm not getting CI feedback on the 5 drivers I believe to be fixed. I know I haven't done OCI8, SQLite3 (possibly affected by #5994) and SQLSrv drivers. Honestly I need some core team feedback on the approach I'm taking here before I spend another minute on it. The CI issues is that this is a problem with all of the drivers, not just PDO. I'm swamped on multiple other projects that I'm 100% responsible for (especially after new tasks resulting from a week long June meeting at ESTEC--I'm providing the primary modeling tool and related generative technologies for the https://mb4se.esa.int/OSMOSE_Main.html project and have major tech demonstrations in September and November and 2 little kids full time through the end of July while school is out). I've spent days on this already, including initial trackdown of the issue. I just don't have more days to burn trying to get the testing environments running locally for each driver. As a primary on this project I'm sure you or collaborators have local environments set up to test each of the drivers without waiting overnight for the CI process. This would let you add the 10-20 lines of code (mostly copied) in a few minutes per driver. This is a severe defect from the perspective of the full doctrine super project, so I don't think asking for a little collaboration on specific drivers is breaking PR protocol. This issue isn't blocking me (as much as I hate to replace individual vendor files in principal, fixing a symfony version and updating one file in my standard base container is a trivial build step). I can merge up to the working head as the branch is obviously well behind now (maybe the psalm hallucination will go away), but beyond that it might be a while before I have bandwidth to touch the last 3 drivers. |
Thank you for your response.
That's okay. Psalm is mainly a review tool for us. We are allowed to be smarter than the tool. ✌🏻 If we are certain that it reports a false positive, I can help you with making Psalm understand your code. And if this turns out to be a bug in Psalm, we can ignore the error and open a bug report in the Psalm repository. However, let's sort this out last. Fixing the failing tests should be the priority.
Okay, no worries. My initial problem with your PR was that you did not include a test which (given the little time that I can spend on open source) made it hard for me to understand what problem exactly you wanted to solve. You've delivered that test now and I thank you for that!
Heh, we all are. The Doctrine core team is maintaining this library in their freetime. I (like the other team members) have to prioritize payed work and family over unpaid work. That mainly means that I can help getting a change merged, but I'm not investigating or fixing bugs right away unless they block me personally.
I have a local PHP interpreter and Docker. That's all it takes to spin up an environment for running the tests and this is also what the CI uses to run them. But I realize that this could be documented better.
As far as I understand the current code, the implementation assumes that the stream that is passed to the DBAL has been opened exclusively for the DBAL to consume. And under that assumption, the current implementation works well. This and the fact that this behavior has existed for years without complaints lowers the severity of this issue for me, to be honest. |
Please perform a rebase onto the 3.6.x branch to resolve the conflicts. We've added a new scenario to the |
d523f52
to
68007b2
Compare
Thank you for the feedback. I've rebuilt, replaced and rebased the branch on my fork, so you can now looking at a single commit without the noise (unless github didn't like the cross-fork history rewrite). I'll batch other suggestions after comments. A pointer or two on how to retrieve a Docker image for running all tests locally without trying to set up each db system would be very helpful. Acknowledged that we all have day jobs. Your after hours work is much appreciated. We can agree to disagree on severity. I came of age in this business in Microsoft, which used a severity/priority rating system and I still think in those lines. "Nobody else has hit this yet" can be an argument for lowering priority, but not severity. Crash and data-loss defects have the highest severity. This isn't data loss, but is very close because you can't retrieve the data from the second (or third,...) delay-loaded collection, which is where my severity analysis comes from. As you say, issues that affect you personally get higher priority (as they should when time is volunteered). |
Perfect.
I don't know how fluent you are in GitHub Actions config files, but there's a dbal/.github/workflows/continuous-integration.yml Lines 246 to 256 in 96d5a70
I think, with basic knowledge on Docker, you should be able to turn this into a In the
That's my point. We're not talking about losing data in production under some rare circumstance. We're talking about you using DBAL in a way that in 100% of the cases simply won't work, which you probably already noticed while testing your app locally. We could even argue that this isn't a bug because the state of the stream pointer is not part of the contract (yet). You assumed a post-condition that we never defined. This would turn your "severe defect" into a feature request. The way I've seen streams being used in apps using DBAL so far was, "I have some binary data stored in a file and I want it to be streamed to the database. I
If we go with your change, we would rewind every stream once we're done with it. In the scenario I've outlined, that feels very unnecessary. Question that we need to ask:
Or we could argue that the current behavior is a correct one. If we know, that DBAL is going to mess with our stream pointer, can we compensate for it? In our app, we could |
I'm proposing this as fix for a doctrine issue as a whole library. I'm 100% with you that this is a dbal feature request looking only at dbal, but experiencing the library as a whole it is a defect in the doctrine stack. My calling code for postgres doesn't even deal with resources. I changed an identifier type (from serial int to uuid) and regenerated a bunch of code, none of which even touches resources. The string-to-resource-to-string translation is internal to the Doctrine stack (and resets the resource location after reading it.) So, all I knew going into this is that with a UUID identifier--which is nominally a supported feature--only the first delay-loaded collection populates. No data is returned for any other collection. This is a bad defect. Even if the data made it to the db, if the only way of ever seeing that data again is to use a different data access layer then it might as well be missing.. Fundamentally, this results from a loose contract between the different doctrine layers. The code calling dbal (going back multiple layers) is treating all parameters as immutable and is assuming that that the original values can be used for multiple calls. Obviously, that is not true here. For the input values to be reusable someone needs to reset the immutable state, effectively making these objects immutable. So, the question is where is the best place to fix this in the Doctrine stack to minimize code changes? This would be the place that moves the stream pointer and doesn't reset it. If you look at the reverse call tree, there are a ton of calls that eventually get to $driver->execute, which is the code that actually reads the stream without resetting it (outside the PHP runtime layer, which is not going to change). I just didn't see a better point to do this without large amounts of new code. There are two areas to consider on the legitimate question of using fseek here: side effects, and cost. So, overall, I think fixing the Doctrine issue with this dbal feature change is the least amount of code and the least risky change when viewing the framework as a whole stack (not just dbal). I'll see what I can do with the docker images. I'm reasonably conversant in docker. I haven't used the github actions feature, but thank you for the starting point. |
68007b2
to
969c47f
Compare
Code review implemented on latest commit. New drivers have not been added yet. I need an updated report from the CI. |
All tests are passing now, so you can proceed with review. I ended up with very similar code in each driver. Moving the trackParamResource/getResourceOffsets/restoreResourceOffsets functions into shared code is obviously an option, but I'm not sure where you would want this helper class. The code that calls each of these methods would still need to be called per-driver. |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
I take one week of vacation and ... poof. As far as I know, this had been a 'green' status ready to commit awaiting review since December 18, 2023. I don't know what else I can do. Obviously, at this point the active branch has moved on, so this would need to be reset to the other branch. However, I don't know why I would take the time to do this when I reached this point 8 months ago and got no action or feedback in response. I don't see a mechanism for me to reopen this, but it is still needed. |
Sorry for the bot closing the PR. It was me who did not find the time to answer you properly. I'll try to have another look at your changes and your answer to my last message next week. 🤞 |
0d333a7
to
4155891
Compare
I rebased the branch onto 3.9.x to match your pull rebase. It looks like you want to go there instead of 4.x.x directly. It would be really nice to get this (and doctrine/orm#11109) pushed so I can eliminate side processes. Please let me know how you want to proceed or if you want this rebased to a later branch. |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
Please, someone review this. The bot is telling me it has been another 90 days. CI had a 'green' run on Dec 18, 2023. It has been almost a year. I still get to explain to my downstream people why UUID identifiers don't work out-of-the-box for some database systems. I will upgrade to whatever branch you want now, just let me know. |
I realize that this must be a most frustrating experience for you. Yes, a lot of time has passed during that time I've had phases where I couldn't invest much time in open source work. And this issue is one where I always need some time to get my head around what where actually trying to fix here. I'm taking this time now. The problem with this PR is not that it needs another review. It needs a decision if we want to include this change or not. This is what I summed up in #5897 (comment). The use-case that justifies the change for me would be that I'd like to bind a stream to a statement that I want to execute twice. And this seems to be what you're trying to do. I'm fine with accepting that use-case but then please as a last change, let's adjust the functional test to reflect exactly that use-case instead of running assertions against But let me take over the PR from here.
This is what still puzzles me. When I bind streams to a statement, I do that because I'm dealing with large data that I don't want to load i to memory first. But a UUID? Even in it's binary representation, that's how much? 16 byte? Why's that a resource and not a string? So, you're basically representing UUIDs as streams (in-memory probably) and that doesn't work because you can only ever bind that UUID once and need to rewind it afterwards. I don't want to make this an endless discussion, but can't rid myself of the feeling that we're fixing the wrong issue here. |
4155891
to
7c305f1
Compare
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused. Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data). Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code. Resource offsets are retrieved immediately before execute in case they change between the bindValue and execute calls. The request was originally for the PDO driver but IBMDB2, Mysql, and PgSQL drivers are also covered. Other drivers will likely also need work. No attempt has been made to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because the reference variables can be replaced during execute, so the original resources may no longer be available to restore after the call. A functional test was added for bindValue and a resource with a seekable position. doctrine#5895
Adding drivers to completed set.
Adding drivers to completed set.
150fdc7
to
904f4b5
Compare
@mcurland I have updated your test to reflect multiple statement executions with the same resource. It seems like your implementation is incomplete here. While the stream is correctly reset after the first execution, this does not happen after the second execution, so the third execution reads an empty stream again. I've tested on MySQL. Can you look into this? |
904f4b5
to
122c8e2
Compare
122c8e2
to
f62e14d
Compare
Thanks for looking at this. I am well acquainted with the delicate balance between food on the table and open source work. I should be able to find some space over Thanksgiving weekend, so Friday/Saturday. Very odd that it only resets once. The test was written against a generic resource instead of UUID because UUID is handled differently across drivers. A standard UUID representation is 36 characters, so significantly more data as a string that the 16 bytes as a resource. There are also other issues with strings. For example, SQL Server offers both NEWID and NEWSEQUENTIALID, with the latter still unique but less random to offer much better index compaction (than would not be achieved with a string representation). Other identifying resource types (using a small number of bytes as an id instead of a string) will also cause this problem. I had a full rundown of which systems it was affecting a long, long time ago (I believe Postgres uses a byte representation and Oracle uses a string (but can optionally convert to a RAW(16))). Basically, I didn't choose the underlying data type mapping as a string or resource. I just followed the documentation to specify the UUID data type in the relational mapping, some systems mapped it to a resource, and only the first delay-loaded query returned data. I wasn't trying to repeatedly send mutable resources as query parameters to a system that tacitly assumed all incoming parameters were immutable data. |
No. A string can hold any binary data, not just characters. You can store a binary UUID in a string and it's 16 bytes. Storing UUIDs in a stream only complicates the access.
Can you point me to "the documentation" that you're referring to? |
The bottom line is that some supported key data types are mapped to PHP resources. As these types can be used as keys (a random guid is meaningless if it isn't an identifier) they will naturally be sent by the ORM as parameterized values to DBAL execute statements when retrieving delay-loaded collections. If you look at this as an overall Doctrine problem and not a DBAL problem everything happening here is basic, boring boilerplate. All you need for a resource identifier is a 'guid' data type (run on postgres, which has native UUID support with a standard extension enabled). You should be able to do the same with 'binary' data type (add a length value on the Column) on any supported RDBMS. namespace App\Entities;
use Doctrine\ORM\Mapping as ORM;
#[ORM\Entity]
#[ORM\Table(name: 'entity1')]
#[ORM\UniqueConstraint(name: 'entity1_PK', columns: ['uuid'])]
class Entity1 {
#[ORM\Column(name: 'uuid', type: 'guid', nullable: false)]
#[ORM\Id]
protected $uuid;
public function getUuid() {
return $this->uuid;
}
public function setUuid($value) {
$this->uuid = $value;
}
} |
Entity identifiers that are PHP resource streams need to work for all references, not just the first one. PDOStatement->execute is reading the binary resources but not restoring the original offsets. No other code is restoring these streams to their original position so that they can be reused.
Examples of failures include empty collections on read (the first lazy load collection on an entity populates correctly, the second is empty) and foreign-key violations while persisting changes (the first entity join produces the correct SQL, the second has no data to read and the FK is violated with the missing binary data).
Making this change as close as possible to the external code that moves the stream pointer eliminates the need to do this in calling code.
This might also be an issue with other driver's Statement implementations. This request just handles the PDO driver. I have also not attempted to fix the deprecated bindParam code path. I do not believe this is called by the current Doctrine code, and is regardless much harder to patch because of the binding to variables that may or may not be populated when they are bound.
#5895