-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose peek next commit to python #1
base: main
Are you sure you want to change the base?
Conversation
Can we add a description to the PRs so that we also iterate on what description we want to present upstream? Description should contain a similar format as the template we have on internal PRs, such as brief description, testing, whether or not this is a breaking change... |
Could we also have all changes be in one commit? it is slightly confusing to go over commits and see one function added and then removed, we would eliminate the confusion by having one commit with all the changes. |
f6293b2
to
ce6ffbf
Compare
Merged the commits |
dt = DeltaTable(table_path) | ||
actions, current_version = dt.peek_next_commit(version=version) | ||
assert (len(actions), current_version) == expected | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could add a test to cover DeltaLogNotFound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's no way to test the loop? I'm thinking. dt.peek_next_commit(version)
where version
does not have an object but version+1
does and _latest_version
is currently version
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here, do you mean use version=-1 as input or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking there'd be a unit test in which the underlying delta table has actual latest version 10
, the DeltaTable
thinks the latest version is 6
, and version 7
doesn't actually exist. So we call dt.peek_next_commit(6)
and expect the next commit to be 8
, which requires
- realizing
7
is bigger than the thought-to-be latest version6
- updating the latest version which is now
10
- checking if version
7
is there, which it isn't - trying again with version
8
and finding version8
to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added data that missed a commit and a test for it. get_latest_version
function can get the actual latest version when there is a missed commit (e.g., if version 7 doesn't exist and the actual latest version is 10, DeltaTable.get_latest_version
will return 10), so we don't need to update the latest version.
I ran --- Build Python binding --- error: future cannot be sent between threads safelyeltalake-core error[E0624]: associated function error: aborting due to 3 previous errors 413/416: deltalake-core Some errors have detailed explanations: E0432, E0624.talake-core For more information about an error, try error: could not compile They seems not relate to our change, I got same errors when I ran |
90d7153
to
91754aa
Compare
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
}) => { | ||
return Err(DeltaTableError::DeltaLogNotFound(current_version)); | ||
} | ||
Err(err) => Err(err), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return Ok(Err(err))
instead of Err(err)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, where is Ok(Err(err))
?
Here we use the same method as peek_next_commit
below to get the commit_log_bytest
, then we return the full commit_log_bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, we assign commit_log_bytes
to Err(err)
. Then on line 470, we return Ok(commit_log_bytes)
. I believe as a result we will return Ok(Err(err))
if we reach line 468.
Am I interpreting this correctly? If so, this seems wrong. I think you'd want to return Err(err)
here and on line 462 you'd want to assign commit_log_bytes
to bytes
, not Ok(bytes)
(which would thus return Ok(Ok(bytes))
which is weird as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes seems weird we need to fix this, also need to add a test for error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned by ohan, rust ?
seems to tidy the results, https://doc.rust-lang.org/rust-by-example/std/result/question_mark.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flat out missed the question mark at the end, ok great.
python/deltalake/table.py
Outdated
next_version += 1 | ||
else: | ||
raise | ||
logging.info(f"Provided Delta Version is up to date. Version: {version}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this log line, I don't think it's necessary. Also removes the import statement above.
dt = DeltaTable(table_path) | ||
actions, current_version = dt.peek_next_commit(version=version) | ||
assert (len(actions), current_version) == expected | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's no way to test the loop? I'm thinking. dt.peek_next_commit(version)
where version
does not have an object but version+1
does and _latest_version
is currently version
.
python/tests/test_table_read.py
Outdated
def test_delta_log_not_found(): | ||
table_path = "../crates/deltalake-core/tests/data/simple_table" | ||
dt = DeltaTable(table_path) | ||
latest_version = dt._table.get_latest_version() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically, this is access to a private member _table
. Is that OK or do we need to promote get_latest_version()
to also be a public member function of DeltaTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added get_latest_version()
as a public member.
Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request? |
@PengLiVectra regarding the errors about |
do we know why some tests are failing? |
f392256
to
167e954
Compare
359306a
to
b402f3b
Compare
b402f3b
to
005d52a
Compare
Description
Expose peek_next_commit to python. Can be used for streaming delta commit changes.
peek_next_commit
will return actions in the commit and commit version of next commit. If current version is the latest version, it will return None (actions will be None) and the current version.An example of usage:
actions, next_version = delta_table.peek_next_commit(current_version)
If current_version is the latest version, the return will be like None, current_version.
Related Issue(s)
delta-io#1886
Testing
Added unit tests that passed.
Breaking-Change
Not a breaking change.
Documentation