-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new Snapshot::new_from()
API
#549
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,7 +53,8 @@ impl Snapshot { | |
/// | ||
/// - `table_root`: url pointing at the table root (where `_delta_log` folder is located) | ||
/// - `engine`: Implementation of [`Engine`] apis. | ||
/// - `version`: target version of the [`Snapshot`] | ||
/// - `version`: target version of the [`Snapshot`]. None will create a snapshot at the latest | ||
/// version of the table. | ||
pub fn try_new( | ||
table_root: Url, | ||
engine: &dyn Engine, | ||
|
@@ -71,6 +72,26 @@ impl Snapshot { | |
Self::try_new_from_log_segment(table_root, log_segment, engine) | ||
} | ||
|
||
/// Create a new [`Snapshot`] instance from an existing [`Snapshot`]. This is useful when you | ||
/// already have a [`Snapshot`] lying around and want to do the minimal work to 'update' the | ||
/// snapshot to a later version. | ||
/// | ||
/// # Parameters | ||
/// | ||
/// - `existing_snapshot`: reference to an existing [`Snapshot`] | ||
/// - `engine`: Implementation of [`Engine`] apis. | ||
/// - `version`: target version of the [`Snapshot`]. None will create a snapshot at the latest | ||
/// version of the table. | ||
pub fn new_from( | ||
existing_snapshot: &Snapshot, | ||
engine: &dyn Engine, | ||
version: Option<Version>, | ||
) -> DeltaResult<Self> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like the method should take+return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe even do pub fn refresh(self: &Arc<Self>, ...) -> DeltaResult<Arc<Self>> (this would have slightly different intuition than |
||
// TODO(zach): for now we just pass through to the old API. We should instead optimize this | ||
// to avoid replaying overlapping LogSegments. | ||
Self::try_new(existing_snapshot.table_root.clone(), engine, version) | ||
} | ||
|
||
/// Create a new [`Snapshot`] instance. | ||
pub(crate) fn try_new_from_log_segment( | ||
location: Url, | ||
|
@@ -250,6 +271,25 @@ mod tests { | |
assert_eq!(snapshot.schema(), &expected); | ||
} | ||
|
||
#[test] | ||
fn test_snapshot_new_from() { | ||
let path = | ||
std::fs::canonicalize(PathBuf::from("./tests/data/table-with-dv-small/")).unwrap(); | ||
let url = url::Url::from_directory_path(path).unwrap(); | ||
|
||
let engine = SyncEngine::new(); | ||
let old_snapshot = Snapshot::try_new(url, &engine, Some(0)).unwrap(); | ||
let snapshot = Snapshot::new_from(&old_snapshot, &engine, Some(0)).unwrap(); | ||
|
||
let expected = | ||
Protocol::try_new(3, 7, Some(["deletionVectors"]), Some(["deletionVectors"])).unwrap(); | ||
assert_eq!(snapshot.protocol(), &expected); | ||
|
||
let schema_string = r#"{"type":"struct","fields":[{"name":"value","type":"integer","nullable":true,"metadata":{}}]}"#; | ||
let expected: StructType = serde_json::from_str(schema_string).unwrap(); | ||
assert_eq!(snapshot.schema(), &expected); | ||
} | ||
|
||
#[test] | ||
fn test_read_table_with_last_checkpoint() { | ||
let path = std::fs::canonicalize(PathBuf::from( | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,7 +90,7 @@ impl TableChanges { | |
// supported for every protocol action in the CDF range. | ||
let start_snapshot = | ||
Snapshot::try_new(table_root.as_url().clone(), engine, Some(start_version))?; | ||
let end_snapshot = Snapshot::try_new(table_root.as_url().clone(), engine, end_version)?; | ||
let end_snapshot = Snapshot::new_from(&start_snapshot, engine, end_version)?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This opens an interesting question... if we knew that (**) Almost, because the start version might have a checkpoint, in which case stripping the checkpoint out of the log segment would also remove the start version. But then again, do we actually want the older snapshot to be the start version? Or the previous version which the start version is making changes to? Or, maybe we should just restrict the checkpoint search to versions before the start version, specifically so that this optimization can work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It would be sufficient to have the older snapshot be I guess this would look like: After all, the goal of the start_snapshot is just to ensure that CDF is enabled. |
||
|
||
// Verify CDF is enabled at the beginning and end of the interval to fail early. We must | ||
// still check that CDF is enabled for every metadata action in the CDF range. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, is this api only for versions later than the existing snapshot?