You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Qbeast Delta implementation, appending data to an existing Delta table with a mismatched schema, and without the mergeSchema flag set to true, results in an issue.
The current logic writes data to storage before performing schema validation. When a schema mismatch is detected, an exception is raised, leaving the Parquet files in the storage, and unreferenced in any transaction logs.
The schema validation process should be updated to occur before writing data, preventing unreferenced (or orphaned) Parquet files on storage and ensuring consistency between storage and transaction logs.
The text was updated successfully, but these errors were encountered:
I would not categorize this as a bug. It's ok to have files in the storage that are not present in the DeltaLog. This is how Optimistic Concurrency works, and that is why there's a Log in place. It happens the same when Deleting or Updating the data using Copy On Write. Another thing is the documentation. If the user wants to read the Table as Parquet, it should know this in advance.
Nevertheless, I agree that checking that parameter before would be a necessary enhancement. But because it would skip a computer-intensive process, not because it ensures consistency between storage and log.
In the Qbeast Delta implementation, appending data to an existing Delta table with a mismatched schema, and without the
mergeSchema
flag set to true, results in an issue.The current logic writes data to storage before performing schema validation. When a schema mismatch is detected, an exception is raised, leaving the Parquet files in the storage, and unreferenced in any transaction logs.
The schema validation process should be updated to occur before writing data, preventing unreferenced (or orphaned) Parquet files on storage and ensuring consistency between storage and transaction logs.
The text was updated successfully, but these errors were encountered: