Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unexpected behavior when cursor fields are non-UTC datetime when using lag #2169

Open
hairrrrr opened this issue Dec 20, 2024 · 0 comments

Comments

@hairrrrr
Copy link
Contributor

hairrrrr commented Dec 20, 2024

dlt version

1.5.0

Describe the problem

Currently, if lag is applied to incremental, non-UTC datatime will be transformed to UTC datatime. If the initial_value is not a UTC datetime, lagged_last_value may be wrong in the string comparison (max ormin).

More importantly, if the cursor field value is a non-UTC datatime, it will compare to a UTC lagged_last_value, which is start_value and last_value in IncrementalTransform.

Cause lag may modify the last_value in IncrementalTransform, it is not safe to write back state if rows are empty.

datatime.strftime(below Python 3.12.0) can not preserve the colon in the timezone when converting datetime objects back to strings. This may also cause an unexpected behavior:

printf(max("2024-10-20T15:30:00-00:00", "2024-10-20T15:30:00-0030"))
# 2024-10-20T15:30:00-00:00 will be printed

I have added tests that can reproduce all the above cases, Let me know if there is anything to improve in the code.

Steps to reproduce

The tests can cover all the cases.

Operating system

macOS

Runtime environment

Local

Python version

3.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant