-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling inlining for MultiZarrToZarr #506
Conversation
I think this was already possible with inline_threashold=0 ? |
Nope, if We could change this so that we use |
That sounds like a good plan |
Ok @martindurant, that should take care of it. Now the code should be functionally identical to the previous, except that the file size lookup is skipped when unneeded. In other words, no more |
Oh, I'm sorry, I'm on my laptop at the moment and it looks like I forgot to set up pre-commit 🙈 Please feel free to squash this PR. |
Hmm, this test failure I can reproduce locally, also from |
@Anu-Ra-g , any idea?
|
Interesting, it seems like something's getting mangled. When I run locally the assert above passes for me, but the subsequent one fails. Note that for me idx_df.iloc[83]
varname wz
typeOfLevel isobaricInhPa
stepType instant
name Geometric vertical velocity
isobaricInhPa 975.0
step 0 days 06:00:00
time 2023-09-28 00:00:00
valid_time 2023-09-28 06:00:00
uri ~/repos/kerchunk/kerchunk/tests/gfs....
offset 21234675
length 1035696
inline_value None
surface NaN
heightAboveGround NaN
meanSea NaN
Name: 83, dtype: object |
The culprit is eccodes v2.38.0. |
@maresb I tried to emulate the above steps on my local system. The error is not occurring consistently.
Even though the error is occuring, the remaining steps work fine. |
Thanks @Anu-Ra-g! May I ask which version of eccodes you are using? |
I'm on eccodes version |
Thanks a lot @martindurant for the merge!!! 🙏 |
I am trying to combine some massive S3-based Zarr references. A
MultiZarrToZarr
operation which would otherwise take a minute would take days because the size of each file on S3 is being queried. Thus I want to be able to pass ininline_threshold=None
so that my datasets combine in a reasonable timeframe.