-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RawState distorted by schema-aware transformations #1667
Comments
Would the same state which is secret in pulumi also be marked secret on tf side? I seem to remember some discussion around TF having weaker support for secrets - is this correct? Would that all be resolvable?
Do you mean the state file? How much is storage space a concern here since this approach seems really simple to implement.
I'm not sure I understand that - isn't the issue that the schema for some property changed, so we might no longer have access to the old schema - would we need to store this special information for all properties? |
TF representation does not have first class secrets, but Pulumi does. This is one of the impedance mismatch issues.
This is another way of saying, we can develop a sophisticated algorithm that stores TF representation cty.Value in Pulumi statefile and recovers it as is, which satisfies the expectations of TF around GetRawState(), in an incremental way, this would be something like storing paths at which flattening occurred or some such delta information. To optimize for space. Serializing the schema could be one way but that's likely too verbose. |
Thanks for digging into this! @t0yv0 Do you have any sense of how many bugs correspond to this problem? Incremental Representation seems like the most viable option, simply by eliminating other options. I don't think we can 2x our state. A New State Representation would require deep collaboration between all of Pulumi's engineering teams, and I think we would need a more compelling argument then improving TF state recovery. |
We have no easy way of telling. We can start cross-correlating bugs here. I suspect this condition is rare and specific to changing upstream resources schemas and upgrades, however when it does hit it can be very impactful like the GCP P1 issue. Yes, IR can be interesting but introduces complexity and some brittleness going forward that needs to be maintained. Perhaps we can at least explore it though to see exactly how this may look like. |
A design document has been written, but we have decided to delay this item of work. |
Hit this again in pulumi/pulumi-aws#4410 (comment) |
One idea here for fixing this is potentially using the cty Type in the StateUpgrader for recovering the old version state in the cases where there is a migration. |
I've explained the problem a couple of times recently, so I though I'd write a quick diagram illustrating the issue. Write TF State to Pulumi StatestateDiagram
TF_Provider --> TFState
TFState --> MakePulumiState
Provider@v1 --> (TFSchema,ProviderInfo)@v1
(TFSchema,ProviderInfo)@v1 --> MakePulumiState
MakePulumiState --> PulumiState
PulumiState --> gRPC(state_file)
Read TF State from Pulumi StatestateDiagram
gRPC(state_file) --> OldPulumiState
OldPulumiState --> MakeTerraformState
Provider@v2 --> (TFSchema,ProviderInfo)@v2
(TFSchema,ProviderInfo)@v2 --> MakeTerraformState: Takes old state and new info
MakeTerraformState --> TF_Provider
The ProblemWhen we read & write TF state, we use the "current" version of the provider. If we write with v1 and read with v2, then any information we use from the v2 state may not be correct for the v1 state. |
@zbuchheit hit this with a user for a |
What happened?
There are incorrect migration or replace plans (example: pulumi/pulumi-gcp#1488) where Pulumi
bridged providers differ from canonical TF behavior in what gets passed as RawState to the
provider's PlanResourceChange method.
Specifically, bridged providers store a Pulumi-oriented representation in the statefile. When
recovering stored state to the TF representation bridged providers call the MakeTerraformState
method that is informed by the provider's schema to not lose information. Schema is a necessary part
of the process as it informs decisions such as MaxItems=1 flattening where Pulumi and TF
representations do not naturally agree. The problem however is that when the state was written with
V1 version of the provider and is being processed by the V2 version of the provider, it is the V2
version of the schema that is informing the process. When V2/V1 schemas do not agree, this may lead
to incorrect results.
Migrations
Normally upstream providers should version the should be considering state migration framework when
taking on changes in the schema:
https://github.com/hashicorp/terraform-plugin-sdk/blob/main/website/docs/plugin/sdkv2/resources/state-migration.mdx
This framework versions the schema by numbers 1, 2, 3, and runs custom code to migrate forward.
Pulumi supports this by storing the resource version under __meta. However, this area is not
sufficiently tested and Pulumi might be deserializing the state under the wrong schema, using the
current provider schema instead of the schema the resource was written with. This is worth
double-checking.
DiffCustomizers
Upstream providers do not always use migrations. In pulumi/pulumi-gcp#1488 upstream is not using the
state migration facility explicitly, but instead just uses a customize diff function that assumes
that previously-shaped data is passed in:
https://github.com/hashicorp/terraform-provider-google-beta/blob/main/google-beta/services/secretmanager/resource_secret_manager_secret.go#L36
Excerpt:
This presents a problem because the original schema is not available, even through the migration
framework. Current Pulumi behavior is to drop "automatic" data during translation since it is not
accounted for by the V2 schema, which prevents the above diff customize function from doing its job,
causing an unexpected replacement plans.
Solutions
Opaque Raw State
Pulumi TF providers could write TF state as-is in an opaque blob under __meta to using the TF
canonical JSON representation, deserialize it accordingly and populate the RawState that way. This
method fully removes the behavior discrepancy between Pulumi and TF.
It is easy to implement and roll out but unfortunately adding an opaque blob costs extra space in
the test file (2x?), and may need masking to account for secret material in the state.
New State Representation
Perhaps a new representation could be designed that replaces what is currently stored in the
statefiles so that space is not wasted, that could serve both the purpose of recovering raw TF state
and the purpose of tracking Pulumi metadata and secrets correctly. In this case the previous schema
of storing state can be retired.
When rolling this out care needs to be taken to around backwards compatibility.
Incremental Representation
Perhaps something can be designed that encodes just enough extra information in the __meta field so
that the bridged provider can recover the TF RawState from the Pulumi representation without
resorting to the schema. This optimizes for minimizing space and migration problems (purely
additive) at the cost of some code complexity that will need to be careful and co-evolve with any
more impedance-mismatch features in the bridge.
Deserialize using the right schema version
Pulumi could consult resources's schema version from the statefile, pull the right schema from the TF migration machinery and deserialize against that. This is an incomplete solution though because:
Example
See above.
Output of
pulumi about
N/A
Additional context
N/A
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: